|
There has been an upsurge in machine learning in recent years. And we all know that deep learning is part of the machine learning. Convolutional neural network is an important method of deep learning, but the number of the networks' training parameters is very large and it has multiple layers, resulting in the training process very slowly. By unrolling the input of convolution layer into matrices and using the BLAS libraries, we can shorten the training time through matrix multiplication. However, another problem occurs when the input image is too large. In such case, during the rearrangement of the input data, data rearranged before may be crowed out of cache because of the later data. When it comes to convolution, the convolution will be greatly affected in that the cache hit rate is reduced. A method based on the process of convolution to accelerate the training is presented, which is dividing input data into blocks when convolution and the effect of the relationship between the block size and the capacity of the cache is studied as well. Experiments show that the method of convolution in blocks is simple and feasible, and the efficiency of convolution is promoted about 50% at best. |
|
Keywords:convolutional neural networks; BLAS; matrix multiplication; convolution in blocks; cache. |
|