Wednesday, October 5, 2016

Keras1.1.0 + Theano0.8.2 + VS2015 + CUDA8.0 + cuDNN5.1 on Windows 10

Back in July, we shared our deep learning Windows 10 setup for the Ultrasound Nerve Segmentation Kaggle competition. That setup relied on Keras1.0.5 + Theano0.8.2 + VS2013 + CUDA7.5 + cuDNN5.0, as explained here.

Now that the competition is over and that CUDA 8.0 has shipped, we are upgrading our DL setup to Keras1.1.0 + Theano0.8.2 + VS2015 + CUDA8.0 + cuDNN5.1.

Full gory details can be found here.

Comments welcome!

1 comment:

  1. Thank you a lot for your guide. I followed your step and it works!

    I got two questions hopefully you can help me

    I use exactly the environment as you use, except mine OS is Win10 Home
    Computer: i7 6820HK, GTX 1060

    When I compared with my old linux computer (ubuntu 14.04 and cudnn v4 with GTX765m and i7 4702HQ), my new computer with takes much longer time to build cuda code.

    Here is the detailed discription.

    > running output (useless info deleted)

    > (c:\toolkits\anaconda2-4.2.0\envs\py34) C:\toolkits\keras-1.1.2\examples>python
    > Using Theano backend.
    > Using gpu device 0: GeForce GTX 1060 (CNMeM is enabled with initial size: 82.0% of memory, cuDNN 5105)
    > X_train shape: (60000, 28, 28, 1)
    > 60000 train samples
    > 10000 test samples
    > ---Compiling time is 0.008022069931030273 seconds ---
    > start training
    > Train on 60000 samples, validate on 10000 samples
    > Epoch 1/12
    > 60000/60000 [==============================] - 10s - loss: 0.3910 - acc: 0.8784 - val_loss: 0.0977 - val_acc: 0.9707

    After model.compile() is done. My windows PC takes about 470 seconds before it start the next stage , which displays "Train on 60000 samples, validate on 10000 samples" and so on.

    On the other hand, my old linux computer only takes about 2s.

    I try to find out what is wrong with my settings. After some tests, I find that my theano 0.8.2 outputs lots of debug info, which looks like:

    Using Theano backend.

    > DEBUG: nvcc STDOUT nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
    > Creating library C:/Users/chaoj/AppData/Local/Theano/compiledir_Windows-10-10.0.14393-Intel64_Family_6_Model_94_Stepping_3_GenuineIntel-2.7.12-64/tmpul7fbj/265abc51f7c376c224983485238ff1a5.lib and object C:/Users/chaoj/AppData/Local/Theano/compiledir_Windows-10-10.0.14393-Intel64_Family_6_Model_94_Stepping_3_GenuineIntel-2.7.12-64/tmpul7fbj/265abc51f7c376c224983485238ff1a5.exp

    Using gpu device 0: GeForce GTX 1060 (CNMeM is enabled with initial size: 82.0% of memory, cuDNN 5105)

    > c:\toolkits\anaconda2-4.2.0\lib\site-packages\theano-0.8.2-py2.7.egg\theano\sandbox\cuda\ UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
    > warnings.warn(warn)
    > DEBUG: nvcc STDOUT
    > Creating library C:/Users/chaoj/AppData/Local/Theano/compiledir_Windows-10-10.0.14393-Intel64_Family_6_Model_94_Stepping_3_GenuineIntel-2.7.12-64/tmpchxavp/97496c4d3cf9a06dc4082cc141f918d2.lib and object C:/Users/chaoj/AppData/Local/Theano/compiledir_Windows-10-10.0.14393-Intel64_Family_6_Model_94_Stepping_3_GenuineIntel-2.7.12-64/tmpchxavp/97496c4d3cf9a06dc4082cc141f918d2.exp
    > DEBUG: nvcc STDOUT
    > Creating library C:/Users/chaoj/AppData/Local/Theano/compiledir_Windows-10-10.0.14393-Intel64_Family_6_Model_94_Stepping_3_GenuineIntel-2.7.12-64/tmpk8ce6m/6174b19f8005a60d6a2faaae7ff1c9a7.lib and object C:/Users/chaoj/AppData/Local/Theano/compiledir_Windows-10-10.0.14393-Intel64_Family_6_Model_94_Stepping_3_GenuineIntel-2.7.12-64/tmpk8ce6m/6174b19f8005a60d6a2faaae7ff1c9a7.exp
    > ...........much more

    Therefore I think during the waiting time, nvcc is compiling and optimize for GPU code. I did not see you have such output. I googled it and find that it is a bug for 0.8.2 and when I install 0.9.0dev4, the problem is fixed (

    Do you take such long time for nvcc to build program? I think it is quite wired.

    Thank you in advance!