Optimizing very large neural network that is greater than size of GPU memory

wirawan0 · August 21, 2018, 9:34pm

I encountered a problem with extremely large neural network that was created in KERAS, using Tensorflow backend. The memory footprint in one of the layer is already bigger than the size of current GPU memory (it has just over 4 billion parameters – which, using float32, translates to a matrix 16 GB alone). Is there a neural network implementation that can get around this problem? For example, will TensorFlow accommodate such a case? There are papers/discussions out there on handling matrix multiply that is greater than GPU’s memory size (basically that is done by tiling the matrix and stream the data). Also there is a paper on virtual Deep NN that claims to be transparent to end-user as far as the use of CPU & GPU memory:

“Training Deeper Models by GPU Memory Optimization on TensorFlow”
http://learningsys.org/nips17/assets/papers/paper_18.pdf

" vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design"

“How to Train a Very Large and Deep Model on One GPU?”

but my question is simply: is this doable using current neural network implementation? Tensorflow claims to support parallel computation, multiple GPU, etc. Will Tensorflow accommodate cases like that one above without choking?

raminder · August 28, 2018, 1:39am

Thanks for sharing the paper and reading. It’s really interesting. While doing research on this, I found https://medium.com/tensorflow/fitting-larger-networks-into-memory-583e3c758ff9 which may be useful for you.

cec5550 · June 13, 2023, 1:03pm

I would recommend to try the unified memory option in tensorflow by setting the env var TF_FORCE_UNIFIED_MEMORY to true .
It allows the memory overflow to fall back on the CPU or alternatively to try half precision which should lower your memory footprint.

jfossot · August 17, 2023, 3:10pm

OR use a system set up for ML/AI Big data. Go through the ACCESS Resource List. You will find many that meet your needs
https://allocations.access-ci.org/resources