A fantastic benefit of Introduction to Data Science with Python and Introduction to Data Science with R is that both courses introduce you to Jupyter Notebooks, part of the Jupyter Project. But what is a Jupyter Notebook? And why is it such a useful tool? We asked course instructor Geoff Bacon to share his thoughts.
Jupyter notebooks are a relatively new way of using Python. I personally found them quite confusing when I first saw them and didn't immediately understand why on earth people would bother with them, but I soon changed my mind.
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. You’ll be able to use it for tasks like data cleaning and transformation, statistical modeling and data visualization, and much more.
In Introduction to Data Science with R and Introduction to Data Science with Python you’ll be able to use Notebooks to access the course activities, or course instructions, or large data sets. Jupyter Notebooks enable browser-based computation, avoiding the need to install software, transfer files, or update libraries.
Because the web has become so important in the last decade, web browsers have become very good at displaying graphics and fonts. Rather than re-inventing the wheel, the Jupyter notebook runs in the browser so it can use the browser's ability to display graphics. So even though we view and edit Jupyter notebooks in the browser, they often have nothing to do with the internet. Jupyter notebooks provide a more user friendly interface than running Python directly from the terminal, and they try to let you annotate your code more than #comments.
For our purposes, "Jupyter notebook" refers to two related things: a file format (similar to .doc, .txt, .csv, etc.) and an application that reads that file format. Notebooks have the extension ".ipynb", which is a historical artefact from the days when the notebooks were just for Python and called "IPython notebooks". They are just text files that follow a particular format, in the same way the HTML files are just text files that follow a special format. And just as web browsers like Chrome, Safari and Explorer are applications that render (visualize) HTML files nicely, the Jupyter notebook application renders the ".ipynb" files nicely.
My favourite feature in Jupyter notebooks is being able to intertwine code and formatted text (i.e. text with headers, lists, etc). It really is just like # comments in a script, but much more user-friendly. I also like the fact that the code that generated some output (like printing a number) is directly above the output itself, and it stays there until I delete it. As you’ll see in the course, if your code produces a plot, it appears directly below the cell and stays there. These two features make Jupyter notebooks ideal for exploratory data analysis, showcasing a finished analysis (to your advisor or publishing alongside a journal article), and teaching.
If you’d like to see Jupyter notebooks in action, watch this extract from a webinar in which I gave a Jupyter demo: