So I have a jupyterhub-esk installation, where I am spinning up containers which run either jupyter lab or jupyter notebooks (depending on the users selection). On launch, the container basically does a git clone, enabling me to select one of my git repos and spin up a working jupyterlab environment with everything in it within minutes.
Until recently I had a big ol’ list of pre-installed libraries in my dockerfile, which grew as I added requirements to my various projects. However, this is all getting rather unmanageable, with the built docker image nearing 4GB (Yikes!).
I have decided to go for a configuration-as-code approach – each git repo will have its own environment.yaml (this is the conda env) at root, which will be installed and activated on-the-fly when the container spins up. It will be the responsibility of each repo to "know" its requirements. This will trade off spin up time for container size, plus make things much more reproducible outside of my specifically-built image!
However I have hit the following issue:
- Most of my notebooks require one plugin or another (gmaps / widgets / etc)
- Plugins for jupyter labs are very sensitive to the specific version of the lab (prime candidate for config as code)
- I can install the plugin fine via environment.yaml, but I cannot activate it it by default!
- This means that each time I start up the jupyter environment, there is ~5 mins of messing with more configuration!! Worse, the whole idea of this project is so that I can send a link to someone not "in the know" – and they can have a configured environment with my notebooks in it, ready to play with in minutes.
Is there a simple configuration as code solution which does both python/conda packages AND jupyter extensions?
I know that the two are orthogonal in some respects (you can swap conda env, but your extensions are a property of your kernal session)
I have already thought of the following two suboptimal solutions:
- Environment.yaml AND a startup.sh
I am not such a fan of this, it solves it sure, but its not a clean approach. Running any old shell script out of a public git repo is always a bad idea (security). Plus, the subset of people who
- good ol’ ipython magic
import sys !jupyter nbextension enable xyz... sys._exit(00)
Again, not the biggest fan, this kills the "run all cells" operation in the notebook, plus feels really clunkey.
Peoples thoughts and suggestions would be well appreciated!