Optimizing very large neural network that is greater than size of GPU memory



I encountered a problem with extremely large neural network that was created in KERAS, using Tensorflow backend. The memory footprint in one of the layer is already bigger than the size of current GPU memory (it has just over 4 billion parameters – which, using float32, translates to a matrix 16 GB alone). Is there a neural network implementation that can get around this problem? For example, will TensorFlow accommodate such a case? There are papers/discussions out there on handling matrix multiply that is greater than GPU’s memory size (basically that is done by tiling the matrix and stream the data). Also there is a paper on virtual Deep NN that claims to be transparent to end-user as far as the use of CPU & GPU memory:

“Training Deeper Models by GPU Memory Optimization on TensorFlow”

" vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design"

“How to Train a Very Large and Deep Model on One GPU?”

but my question is simply: is this doable using current neural network implementation? Tensorflow claims to support parallel computation, multiple GPU, etc. Will Tensorflow accommodate cases like that one above without choking?


Thanks for sharing the paper and reading. It’s really interesting. While doing research on this, I found https://medium.com/tensorflow/fitting-larger-networks-into-memory-583e3c758ff9 which may be useful for you.