This guide will be about setting up the fiddly bits when deploying a Jupyter Hub to an AWS instance. It won’t go into explicit detail about absolutely every step as the docs already do a great job of that. The purpose of this post is to discuss the things I found tricky after the install was complete.
First follow the install instructions for AWS.
These amount to deploying an EC2 instance of type Ubuntu Server 18.04 LTS (HVM), SSD Volume Type
, ensuring sufficient CPU and memory (I chose t2.large
), adding the following snippet
#!/bin/bash
curl -L https://tljh.jupyter.org/bootstrap.py \
| sudo python3 - \
--admin <admin-user-name>
and also using a security group with SSH, HTTP and HTTPS open.
Configuring an elastic IP and DNS
The next thing I did was to associate an Elastic IP address to the instance we recently setup. Go here, click the IP you’d like to associate (or “Allocate IP address” if you currently have none free) then click “Actions > Associate Elastic IP address” before choosing the appropriate instance running Jupyterhub.
Now in the config of my DNS provider I added an A record, e.g. jupyter.mycompany.com
pointing to this elastic IP address.
Note: you have to do this if you wish to use SSL.
SSL
After around 10mins, you should be able to visit http://jupyter.mycompany.com
(or just the public IP of the instance if you didn’t set up a domain) and be greeted with a login screen. Choose the <admin-user-name>
that you configured and pick a password. This will log you in as the admin.
TLJH has a nice guide on how to configure SSL. You will need to have set up a public URL for this (as per above), and then it boils down to
sudo tljh-config set https.enabled true
sudo tljh-config set https.letsencrypt.email you@example.com
sudo tljh-config add-item https.letsencrypt.domains yourhub.yourdomain.edu
Check your config with
sudo tljh-config show
and reload the proxy
sudo tljh-config reload proxy
Now you will see the secure padlock if you visit your Jupyterhub.
Jupyter lab as default
Jupyter lab is the more modern counterpart of the Jupyter notebook. If you would like to run this interface by default TLJH also has a guide.
After logging in with the admin user(or via SSH to the EC2 instance) open up a new terminal and run
sudo tljh-config set user_environment.default_app jupyterlab
sudo tljh-config reload hub
You can still access regular Jupyter at /user/<username>/tree
but by default your users will now see
/user/<username>/lab
.
OAuth with Github
Following the TLJH guide
Looking at the docs for Oauthenticator Github suggest we should be able to restrict users with something like
sudo tljh-config set auth.GitHubOAuthenticator.allowed_users <allowed-user-1>
sudo tljh-config reload hub
Unfortunately, this restriction just did nothing for me despite the change showing in /opt/tljh/config/config.yaml
(or equivalently
when running sudo tljh-config show
)
Before finally figuring out the problem, I had also tried with a custom config snippet to manually add a Jupyterhub config as described here. Any files in /opt/tljh/config/jupyterhub_config.d
that end in .py will be loaded in alphabetical order as python files to provide configuration for JupyterHub. Then see here
# /opt/tljh/config/jupyterhub_config.d/oauth_config.py file
c = get_config()
# c.GitHubOAuthenticator.allowed_organizations = Set()
# c.GitHubOAuthenticator.admin_users = Set()
c.GitHubOAuthenticator.allowed_users = {'allowed-user-1'}
c.GitHubOAuthenticator.blocked_users = {'banned-user-2'}
This did not work either (I could still login as the banned user).
What did work in the end was
sudo tljh-config set auth.GitHubOAuthenticator.whitelist <allowed-user-1>
sudo tljh-config reload
The working config file looked like
auth:
type: oauthenticator.github.GitHubOAuthenticator
GitHubOAuthenticator:
client_id: XXXX
client_secret: XXXX
oauth_callback_url: XXX
whitelist:
- allowed-user-1
admin_users:
- admin-user-2
I guess because they are still using an older version of oauthenticator
, which uses the deprecated whitelist
parameter.
You can test by first going to https://jupyter.yourcompany.com/hub/admin and deleting the user in question and also deleting
them from the Linux system with sudo deluser --remove-home jupyter-allowed-user-1
, now change the whitelist to be say some-other-user
, you will see in the admin user list that some-other-user
is present and has “never” logged in and you will see “allowed-user-1” is not. If you try to authenticate with Github and the allowed-user-1
you should see a 403 forbidden.
If now you switch back to allowed-user-1
you can login.
Switching back
If you do want to switch back authentication so that it uses the FirstUseAuthenticator
, just enter:
sudo tljh-config set auth.type firstuseauthenticator.FirstUseAuthenticator
and then clean up the config.yaml
so all that remains is
auth:
type: firstuseauthenticator.FirstUseAuthenticator
Using the FirstUseAuthenticator
means an admin gets to add users, then when they login they choose their own password,
else if they try to login without such a username in the system, they are rejected.
Add admin users
To add a user as an admin do
sudo tljh-config add-item users.admin <username>
sudo tljh-config reload
Deploy keys
You may want to be able to push and pull from some Github repo from the Jupyterhub instance. Deploy keys allow you to achieve this.
On the EC2 instance (for example after SSH’ing in), run:
ssh-keygen -t rsa -b 4096 -C "{email}"
Copy it to https://github.com/<SomeOrg>/<some-repo>/settings/keys/new
Now ensure whatever jupyter user has access to those keys in their home dir, e.g. /home/jupyter-some-user/.ssh
.
Copy the keys and ensure to chown jupyter-some-user.jupyter-some-user -R /home/jupyter-some-user/.ssh
Now sudo su jupyter-some-user
and cd ~
before finally cloning the git repo into this users directory.
If you checked write access
on the deploy key this user will also be able to commit changes to the notebooks when done.
Another option would be keep the keys in a shared directory across many users as described here and then just create a symlink to the ida_rsa
files in a given users ~/.ssh/
directory.
Conda packages
We can install conda
packages following this guide.
Basically login as admin, get a new terminal and run commands like
sudo -E conda install numpy
The -E
flag is to preserve the environment.
See the note about doing this from SSH (export PATH=/opt/tljh/user/bin:${PATH}
).
Readonly postgres user on RDS
If you want to pull data from a Postgres database on RDS then pycopg2
is a good choice. You may want to set up a dedicated user in the database than has only read-only permissions (for security and to avoid accidents, since it’s very unlikely you will want to write data from a Jupyterlab notebook to the database)
To do that run the following commands:
# The role
cursor.execute('CREATE ROLE readonly;')
cursor.execute("GRANT USAGE ON SCHEMA public TO readonly;")
cursor.execute("GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly;")
cursor.execute("GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO readonly;")
cursor.execute("ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO readonly;")
cursor.execute('GRANT CONNECT ON DATABASE your_db TO readonly;')
# The user
cursor.execute("CREATE USER readonly_user WITH PASSWORD 'XXXXXXXXXXXXXXXX';")
# The role to the user
cursor.execute("GRANT readonly TO readonly_user;")
After a successful login with this user, test no create perm with
cursor.execute("CREATE TABLE public.test1 (id serial PRIMARY KEY, num integer, data varchar);")
DB via IAM conn rather than creds:
Rather than have to distribute database credentials to each user (either individually or via global env variables or a shared folder) it would be better if we were using the role itself to connect
Things that you must do to use the IAM role for database access:
- Enable IAM/Password in the RDS instance settings.
- The IAM Role associated with instance to have RDS read-only access as a policy.
- Run
GRANT rds_iam TO <user>;
(to revertREVOKE rds_iam FROM <user>;
)
However, for me this did not work. I hit the following error
FATAL: pg_hba.conf rejects connection for host "X.X.X.X", user "readonly_user", database "your_db", SSL off
I’d be interested to hear any solutions to that.
Extensions
TLJH installs a really old node js (6.x) and this prevents the Jupyerlab Github extension from installing. You may need to upgrade your version of node to get those extensions working.
Comments