Leveraging the Power of Jupyter Notebooks

Jupyter notebook is the most go-to platform when it comes to performing data analysis or performing data science-intense data preprocessing, EDA and data visualizations. With all the cool stuff happening in Jupyter notebooks, it still might be the case you aren’t leveraging the power of your machine. Even after having a high-end config laptop, be it a MacBook Pro or Windows/Linux based machine with high-end GPUs, it still isn’t using its full potential.

Yep, that’s true. This is what I recently experienced working on a project where my dataset and all my files in total were about 37 Gigs. Whenever I was trying to create a spark dataframe and perform data processing on it, my RAM was crashing like anything, whereas in my activity monitor, my RAM usage wasn’t a problem. In fact, less than a quarter of my RAM was being used.

This is when I realized the problem could be with Jupyter notebook. As it turns out, Jupyter was capping the RAM limit. Now, this is something I felt was really important, as I didn’t come across such issues previously as I was dealing with smaller datasets.

“Even after owning a high-end machine, I wasn’t leveraging the power of my machine”

Insufficient Memory:

Here’s how I resolved this:

I am doing this on a Mac, but the process is quite similar for a Windows or a linux machine.

Open your terminal and change directory into .jupyter and type the following command

cd .jupyter jupyter notebook –generate-config

This will generate a jupyter_notebook_config.py file inside .jupyter folder.

Open the jupyter_notebook_config.py file and edit the following property:

NotebookApp.max_buffer_size = your desired value

You’ll need to uncomment this property and add your desired value based on the amount of RAM on your machine. This is what mine looks like:

In Mac, the property is prefixed with c:

c.NotebookApp.max_buffer_size = your desired value

Remember that the desired valued is in bytes. For example, in my case I want the cap for Jupyter to be roughly about 12 Gigs so I put 12000000000 bytes.

Another simpler way to go about this is by typing the following command in your terminal:

jupyter notebook –NotebookApp.max_buffer_size=your_value

Voila! You’re all set. This will boost your processing tasks and leverage the high-end configuration of your machine.

IOPub data rate:

Another common problem that you could run into while using Jupyter environment is of IOPub data rate. The error looks like this:

IOPub data rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable --NotebookApp.iopub_data_rate_limit.

Current values: NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec) NotebookApp.rate_limit_window=3.0 (secs)

This usually occurs when you want to visualize enormous amounts of data using python visualization libraries like Plotly, matplotlib, seaborn and so on. The jupyter’s default configuration is not set to handle the excessive amount of data and throes this error message. I used to get this error while visualizing huge datasets using visualization libraries but you might get this data whenever you’re using Jupyter to handle enormous amounts of data or there’s huge data exchange.

Here’s an easy fix to solve this issue:

jupyter notebook –NotebookApp.iopub_data_rate_limit=your_value

Just keep in mind that the desired valued is in bytes. For example, in my case I want the cap for Jupyter to handle data till 10 Gigs so I put 10000000000 bytes.