Set up data science project environment with google colab

Content

  1. Content
  2. Motivation
  3. Connect google colab with ssh
    1. :dart: Setup Google Colab
    2. :computer: Setup local terminal
    3. :rocket: What did you achieve after all these steps?
  4. Setup codeserver in Google ColabServer
    1. :dart: Using VScode Remote Development [Best Approach]
    2. :star: Shortcut
  5. :warning: Limitations
    1. :bulb: Solution

Motivation

What if we can connect google colab from our local pc terminal, clone github project, add/modify/run scripts/notebooks using google colab’s free GPU power, build model and then publish code into github, then it will be wonderful.

This blog aims to achieve this goal.

Connect google colab with ssh

This part is heavily borrowed from this blog, which I find very useful

What you need?

  1. ngrok Account
  2. ssh public key

Go to ngrok and create an account and copy the authtoken only

Example:

./ngrok authtoken <your token>

we need only last argument of command from above.

for second option : run ssh-keygen and just press enter 3–4 times if it asks for anything and then copy the public key.

  • run cat .ssh/id_rsa.pub in the terminal

:dart: Setup Google Colab

Now open googlecolab, connect to desired runtime type and execute following commands in notebook cell:

!wget https://gist.githubusercontent.com/msank00/815d1e722dfe50015fc2310877dd6176/raw/dffb81bb5f72ecfa8502ee3b31312020c778e534/colab-ssh-jupyter.sh
!bash colab-ssh-jupyter.sh

When prompt asks, enter the authtoken and ssh public key respectively. [You already got this from the above 2 steps]

It will print the ssh command to connect to colab instance. e.g

ssh root@0.tcp.ngrok.io -p 16826 # it will be different for your case

Remember: When opening colab, set it to gpu if you want the gpu benefit.

:computer: Setup local terminal

Open terminal in your local pc and run the command you get on colab cell.

You will get a bash shell logged in to colab instance. for jupyterlab, first install following on the terminal :

pip3 install jupyterlab && apt install tmux

Install OH-MY-ZSH:

apt install zsh && sh -c "$(wget -O- https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)" && echo "ZSH_THEME="af-magic"" >> /root/.zshrc

Now run tmux and run following command in it: (chose port of your choice: $5$ digits random number preferable)

jupyter lab --ip 0.0.0.0 --port 56789 --no-browser --allow-root

Now Jupyter starts running in http://0.0.0.0:56789/, but we can’t access it as it’s running in the google-colab server. We need to expose this url to a public url which you can access. This can be done using tunnelling and port forwarding.

For exposing local/private url to public url we can use some FREE external service as per this reddit post. We are using http://localhost.run/

:warning: There is a security risk. Whoever gets access to the public url, can access it. Be careful !!

Steps:

  1. split tmux screen horizontally by running ctrl + b (default tmux activation mode) and "
  2. Run the below command
  • :new: :ballot_box_with_check: Correct command
ssh -R 80:localhost:56789 $(openssl rand -base64 12)@ssh.localhost.run

# This generates a random string as domain name and creates
# <random_string>@ssh.localhost.run

:clipboard: Note: Keep the same port number which you used for running jupyter, here it’s $56789$

And this will prompt a public url in the terminal. Copy it and run in the browser.

image

*image with old syntax for tunnelling

:rocket: What did you achieve after all these steps?

  • SSH to google colab
  • Starting jupyterlab
  • Same computing resource that google colab is using at the backend.
  • Now create helper scripts and import them in notebook.
  • Instead of a single linear notebook, you can create a project folder and modularize your code easily.
  • Apply all the git benefits and finally push to github.
  • Can upload files directly into jupyter notebook.

Reference:

Content


Setup codeserver in Google ColabServer

The best part is yet to come. So far we have decoupled ourselves from the default google colab notebook and have set up jupyter notebook and have exposed the local url to public url. This gives us lots of flexibilities. But it will be good if we can set up a server based IDE. If we can, then we will get the best of $4$ worlds.

  • jupyter notebook (module testing)
  • google colab’s FREE computing power with GPU
  • IDE (project management)
  • github (versioning)

Now to set up IDE in google colab server, it’s pretty simple.

:dart: Using VScode Remote Development [Best Approach]

VScode has come up with remote ssh extension which helps you to connect to remote machine and do the development in your local machine. This is quite simple and straight forward.

All you need to do is to use the ssh connection that you receive while setting the google colab via ngrok [mentioned above]. The ssh connection is something like this

ssh root@0.tcp.ngrok.io -p 16826

Configure the ssh extension in your local VScode with the above connection. And done.

For more details check this vscode documentation

:star: Shortcut

Installing jupyter, tmux can be done quickly using this gist.

Login to the colab using ssh and then do this

$ wget https://gist.githubusercontent.com/msank00/815d1e722dfe50015fc2310877dd6176/raw/dffb81bb5f72ecfa8502ee3b31312020c778e534/colab-ssh-jupyter.sh
$ bash setup_ds_on_colab.sh
  • Run vscode in local machine and ssh to colab using remote explorer plugin. :fire: BEST APPROACH :fire:
    • This step helps to avoid using ssh.localhost.run or serveo.net which are quite unstable in colab’s ssh.

*I try to keep updating the gist with latest code-server ide

Reference:

Content


:warning: Limitations

This comes with usual limitations of google colab

  1. The instance will run for 12 hours and after that it will restart
  2. After restart, you will loose the codes and uploaded data files

:bulb: Solution

For the first part no solution

For the second part some possible solution

  • Commit and push code to github frequently.
  • Connect google drive with colab and all. But I haven’t explored that path. So don’t know.

:santa: FIN


Back to Top

Published on February 21, 2020