Access Free Google Cloud GPUs for Deep Learning

If you are starting out with Machine/Deep Learning, one roadblock that you may experience is lack of computing resources. Training more complex models, especially those that use larger data sets take too long or even impossible without GPUs. As of the time of writing, the latest line of NVIDIA GPUs are still scarce/too expensive. Additionally, you may not be ready to commit money to such purchase anyway. Fortunately, there is a way to “rent” Google Cloud GPUs and access them remotely.

Google Colab

Colab is a free product that is easily integratable with Google Drive. Each notebook file connects to a remote computer that already comes pre-installed with Python, and popular data science libraries (NumPy, Pandas, Scikit-Learn). You can of course, intall other libraries using pip. I really like Colab because it reduces a lot of friction when getting started. You can even mount your Google Drive as if it is just another folder in this PC. For my case, I use Google Colab for data analysis, visualization and prototyping of scripts. I might also use it for training smaller models that only take a few seconds/minutes to train.

The free tier of Google Colab gives you access to 1 NVidia K80 GPU for a few hours each day. It is usable for training smaller models, although there are issues of interruption when training bigger models. First, there is the issue of idling. When you leave Colab alone for hours training, you might appear as idle and Colab may stop. Running your training loop overnight while you sleep may lead to some disappointing mornings if the execution gets terminated midway through the night. There are workarounds for this, but I think they have also implemented a captcha recently against that workaround. Next, as said before, you can only access the GPU for a few hours. After its up, you are forced to use the CPU only environment, and you have to manually re-run your notebook. Adding checkpointing and reloading of trained weights on your code is a must. 

There is a paid tier of Google Colab that gives you longer times and access to better GPUs. It even gives you access to a terminal so to run python scripts instead of notebook based training. It is however a monthly subscription and not pay as you use. There are other similar services such as Kaggle, Paper Space and AWS Sage Maker Studio but as I have not used them much, they will be out of the scope for this article.

Related

Build an Automated ETL Pipeline. From setting up Docker to utilizing APIs and automating workflows with GitHub Actions, this post goes through it all.

Master predictive modeling with Scikit-Learn pipelines. Learn the importance of feature engineering and how to prevent data leakage.

Google Cloud GPUs

This option lets you configure a virtual machine (CPU, GPU, Hard Disk) from Google’s servers that you will access remotely. This is not a free service, although they offer a $300 dollar credit (valid for the first three months) trial. A credit card/alternative payment method is needed to avail the promo. There are multiple ways to set this up and connect but I found the AI Platform module of Google Cloud offers the most straightforward way.

Increasing Resource Quotas

To gain access to GPUs, you must first log a request to increase your quota. The response says that the process will take days, but from experience this is only a few hours to get approved. Start by searching for All Quotas on the top search bar.

Search for GPU: All regions. The default for me was 0. Select the checkbox on the left and click edit quotas to set this to 1 (or 2). Before December 2021, I was actually able to use one NVidia T4 but I experienced resource unavailability during the holidays so I had to set up the quotas for K80 GPUs on a specific region too. As seen in the photo, my CPU quota, is 14 cores. I was actually able to run two VMs with 4 CPU cores and 1 K80 each simultaneously.

Setting Up Your Virtual Environment

Search for AI Platform. Select the Workbench tab on the left. You should see a page with a button to create a new notebook (virtual machine). Click that button and a dropdown appears. This enables you to choose one of their pre-configured environments (libraries and dependencies pre-installed). There is one for PyTorch and Tensor Flow.

You can choose anything since we would be changing some settings later anyway. For now I will select a Python with CUDA environment then select the option for T4 GPU. There will be a pop up which has a link in the bottom for more settings. Click that button. Now on this menu, we can configure specifics of the virtual machine. I will change the GPU to 1 NVidia K80 (I can go two GPUs on one VM or split them between VMs per my quota). I will also click the checkbox to install the CUDA drivers before hitting create. On the right side, you will see the cost (monthly) of “renting” this machine. Don’t worry because the billing is very granular so you will only be billed on the minutes that you used it.

Once you hit create, you just need to wait until everything gets set up and an Open Jupyter Lab button appears. It opens a jupyter lab environment on a separate tab. From here, you can create a notebook or a script. You can also access a terminal to execute commands. This environment already has Git integration so you can clone your code from online repositories like GitHub then commit and push any changes. Similar to Colab, popular libraries are already pre-installed. For my case, I want to use a specific version of PyTorch so I opted for the Python environment and not the one where PyTorch is preinstalled to avoid potential dependency conflicts.

For deep learning training which can take hours (or even days), It is advisable to use a python script instead of a notebook. This way you can close your browser and even turn off your actual computer and the training will still continue.

Unfortunately, it doesn’t have a GDrive integration (yet). I have to install the gdown library to download the dataset from my GDrive (less than 10s for more than 500 MB of files). I am still working out  a smooth workflow of uploading files to my GDrive from this environment. For now, I save my trained model weights and other huge training products as artifacts in Weights and Biases. This workflow avoids having to actually download and large files to and from my local machine (and takes my internet speed mostly out of the equation).

And that’s it! You now have a functional virtual computer where you can run training scripts uninterrupted for free (as long as you still have credits) even while you are sleeping. Once you are done remember to delete the notebook using the button on the toolbar on the top. You will get charged even if the machine is idling.

Address
Quezon City, PH

Work Hours
M-F  07:00-16:00