Tutorial: Getting Started with Distributed Deep Learning with Caffe on Windows

Categories Machine Learning, Uncategorized

Introduction

What is Caffe?

A deep learning framework developed by Berkeley Vision and Learning Center. It makes creating deep neural networks easy without writing a ton of code.If you don’t know what deep learning is, here is a great guide to getting started: http://cs231n.github.io/.

Setup

My setup:
Windows 8.1 on 64bit
Visual Studio 2013 Community
GeForce GT 750M
CUDA 7.5

1. Check for Compatibility

Make sure you are on a supported Windows operating system:
Windows 8.1
Windows 7
Windows Server 2008
Windows Server 2012.(If you are using Windows 8, upgrade through here: http://windows.microsoft.com/en-ca/windows-8/update-from-windows-8-tutorial)

Make sure your GPU is supported by CUDA: https://developer.nvidia.com/cuda-gpus 
Anything with compute capability of  >=3.0 should be good.

If you do not have a compatible GPU, you can still use Caffe but it will be magnitudes slower than with a GPU and skip part 2.

Make sure you have a compatible Visual Studios for CUDA support: 
Visual Studio 2013
Visual Studio 2013 Community (Download Visual Studio 2013 Community Edition Free)
Visual Studio 2012
Visual Studio 2010

More nVidia documentation at:
http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-microsoft-windows/#axzz3wsl3JktL 

2. Install CUDA

Download and install CUDA toolkit here: https://developer.nvidia.com/cuda-downloads
 
Verify CUDA can compile:
Go to C:ProgramDataNVIDIA CorporationCUDA Samplesv7.5 and open the solution file (i.e. Samples_vs2013.sln) in Visual Studio
In the solution explorer, build 0_Simple/vectorAdd
Run C:ProgramDataNVIDIA CorporationCUDA Samplesv7.5binwin64debugvectorAdd.exe
 
The output should be:
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

3. Install Caffe

Fork the windows port of Caffe: https://github.com/happynear/caffe-windowsDownload third party libraries and extract to caffe-windows/3rdparty
Remember to add caffe-windows/3rdparty/bin to your PATH

Open caffe-windows/buildVS2013/MainBuilder.sln in Visual Studio
If you don’t have a compatible GPU, open caffe-windows/build_cpu_only/MainBuilder.sln

Set the GPU compatible mode:
Right click the caffe project and click properties
In the left menu, go to Configuration Properties -> Cuda C/C++ -> Device
In the Code Generation key, modify the compute capabilities to your GPU’s (such as compute_30,sm_30; etc)

Build the solution in release mode
Right click the solution and click Build Solution
(It’s OK if matcafe and pycafe fail)

Testing
Download the mnist leveldb from http://pan.baidu.com/s/1mgl9ndu
Extract the folders to caffe-windows/examples/mnist
Run caffe-windows/run_mnist.bat

You should get some output similar to the following when you finish:
….
I0112 00:06:37.180341 45040 solver.cpp:326] Iteration 10000, loss = 0.00428135
I0112 00:06:37.181342 45040 solver.cpp:346] Iteration 10000, Testing net (#0)
I0112 00:06:51.726634 45040 solver.cpp:414]     Test net output #0: accuracy = 0
.9914
I0112 00:06:51.726634 45040 solver.cpp:414]     Test net output #1: loss = 0.027
0199 (* 1 = 0.0270199 loss)
I0112 00:06:51.726634 45040 solver.cpp:331] Optimization Done.
I0112 00:06:51.726634 45040 caffe.cpp:215] Optimization Done.

Full instructions can be found on the readme of https://github.com/happynear/caffe-windows

Results:
solver_mode: GPU
Start Time: 23:25:19.38
Finish Time: 23:28:37.62

solver_mode: CPU

Start Time: 23:38:01.62
Finish Time:  0:06:51.91As you can see, even a low-end GPU can train a magnitude faster than a CPU.