Python

Crash course in Python and Jupyter Notebooks

Matthew DeHaven

March 31, 2025

Course Home Page

Python Overview

  • Not that different from R
    • syntax is similar
    • but some translation required
  • Environments are even more important
  • More object-focused
    • methods
  • the . operator has multiple meanings
    • packages and methods
  • Spacing sensitive

Python Installs

Installing Python

Python comes with many operating systems, but to get the latest version…

You want to always have the latest version, but this leads to many installations of Python on the same computer.

Each of these installations are referred to as “interpreters”.

Choosing a Python Interpreter

VS Code allows you to choose a Python Interpreter when editing a “.py” file.

Running Python in VS Code

You can run a whole file of Python code by clicking the play arrow.

Or you can run a single line or selection…

  • hitting Shift+Enter
  • right clicking the line, then selecting “Run Python”

Both of these will execute the Python code in a terminal.

Hello World Example

x = 'Hello Class!'
print(x)

a = 0.1
b = 0.1
c = 0.1

print(a + b + c == 0.3)
Hello Class!
False

Python Environments

Environments

There are two main options for setting up Python environments:

  1. venv

Comes with the latest Python installations, similar to renv for R, creates a folder with symlinks to the packages.

  1. conda

Part of the Anaconda/miniconda world.

Can be used both as a package manager and for environments.

Aside on Conda

Conda manages both Python installations, packages, and environments from outside Python.

Venv manages Python environments from within Python.

Anaconda is a distribution of (1) a Python installation, (2) Conda environments, (3) a bunch of default packages.

Miniconda is a distribution of (1) a Python installation and (2) Conda environments.

Use Python Environments

You should be using environments for any language, but especially for Python.

  • You will have multiple versions of Python installed at once
  • Python package managers historically handled dependencies poorly
    • Updating one package would break another package
  • The latest versions have started giving warning messages if you try to install packages system-wide

Creating an Environment

For venv the command to create a virtual environment is…

terminal
python -m venv /path/to/new/virtual/environment

And for conda

terminal
conda create --name <my-env>

Creating an Environment in VS Code

Luckily VS Code’s Python Extension makes handling these environments easy.

  • Open the Command Palette Shift+Cmd+P

  • Search for “Python: Create Environment…”

  • Select either “venv” or “conda”

  • Select the Python interpreter (version) to use

Now whenever you launch a terminal for this workspace, it will use the environment you created.

Python Packages

Python Packages

Packages are how you can import functions.

You will sometimes see “Modules”. A package could have one or many modules within it.

Packages can be installed using

  • pip built in to Python
  • conda

Pip Install Packages

pip stands for “pip installs packages”.

It installs Python packages hosted on the Python Package Index (PyPI).

terminal
pip install numpy

pip install numpy is executed in a terminal, not in Python code itself, unlike R or Julia.

Conda Install Packages

Installs packages hosted on the Anaconda repository.

Can also install other software, like R.

terminal
conda install numpy

You can also use pip install to install packages in a conda environment, but this can cause conflicts, so you should use conda install by default in this situation.

Importing a Package

Once a package is installed (to your environment), you can…

  1. Import the package and shorten the name
import numpy as np

numpy is a package for numerical computation (ex. better arrays and linear algebra).

If you want to use the function arange() from numpy, you would write

np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Importing a Package, Other options

  1. Import the package without shortening the name
import numpy
numpy.arange(10)
  1. Directly import one function of the package
from numpy import arange
arange(10)
  1. Directly import all functions of the package
from numpy import *
arange(10)

The first option is what is recommended and used most often.

Python Basics

Variable Assignment

Python uses a single = for assignment

x = 42

Just like R, functions can be assigned to new variables:

a = np.arange
a(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Basic Math

Math is not that different:

3 + 3
6
3 - 3
0
3 * 3
9
3 / 3
1.0
3 ** 3 # Power
27
3 % 3  # Modulo
0

Logic

Logic operators are written out instead of symbols.

1 > 2
False
1 < 2
True
1 > 2 and 1 < 2
False
1 > 2 or 1 < 2
True
not 1 > 2
True

Python Data Types

  • Booleans
  • Numbers: Integer, Floating
  • Strings
  • Collections:
    • List, Tuple, Dictionary, Set

Lists

Lists are constructed as comma-separated elements in square brackets

my_list = [1, 5, 2, 8]
print(my_list)
[1, 5, 2, 8]

Lists don’t enforce a single type.

my_list = [1, 'hi', 2, False]
print(my_list)
[1, 'hi', 2, False]

Tuples

Tuples are constructed as comma-separated elements in parentheses.

my_tuple = (1, 'hi', 2, False)
print(my_tuple)
(1, 'hi', 2, False)

Tuples are immutable; lists are mutable.

  • i.e. you cannot change values of tuples once created, or add additional elements
my_list[2] = 4
print(my_list)
[1, 'hi', 4, False]
my_tuple[2] = 4
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[21], line 1
----> 1 my_tuple[2] = 4

TypeError: 'tuple' object does not support item assignment

Sets

Sets are constructed as elements in curly braces.

my_set = {"RI", "MA", "VT"}
my_set
{'MA', 'RI', 'VT'}

Sets are only unique values.

my_set  = {"RI", "RI", "MA", "VT", "MA"}
my_set
{'MA', 'RI', 'VT'}

Dictionaries

Dictionaries are constructed as key:value pairs in curly braces.

my_dict = {"RI":"Rhode Island", "MA":"Massachusetts", "VT":"Vermont"}
print(my_dict)
{'RI': 'Rhode Island', 'MA': 'Massachusetts', 'VT': 'Vermont'}

Dictionaries can be subsetted by their keys.

my_dict["RI"]
'Rhode Island'

Python Indexing

Python starts indexing from 0.

x = ['a', 'b', 'c']
x[0]
'a'

For some people, this is the mark of a true programming language.

One of the most common errors when switching between R and Python.

For Loops

Python is space sensitive.

For example, a for loop requires the looped lines to be offset by at least one space (customary to use a tab—4 spaces).

for i in np.arange(3):
    print(i)
0
1
2

Leaving out the space will throw an error.

for i in np.arange(3):
print(i)
  Cell In[28], line 2
    print(i)
    ^
IndentationError: expected an indented block after 'for' statement on line 1

Objects

Python is an object-oriented programming language.

Example: a list is an object.

Objects have

  1. Properties
  2. Methods

Classes define objects; objects are the actual instance of the class.

Methods

Methods are a key feature of Python.

Methods are functions attached to an object that operate on the object.

For instance, we said a list is an object. Lists have the method: sort().

a_list = [2, 1, 9, 4, 6]
a_list.sort()
print(a_list)
[1, 2, 4, 6, 9]

Methods are Functions

Remember: methods are a type of function.

You can think of methods as functions that always take as an input the object they are defined for.

They may take other inputs as well.

Every object has their own methods. The sort() method is not defined for tuples.

a_tuple = (4, 1, 3, 2)
a_tuple.sort()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[30], line 2
      1 a_tuple = (4, 1, 3, 2)
----> 2 a_tuple.sort()

AttributeError: 'tuple' object has no attribute 'sort'

Append Method

If you wanted to append an element to a list…

a = [2, 4, 1]
a.append(3)
a
[2, 4, 1, 3]

If you appended a whole list…

a = [2, 4, 1]
a.append([3, 5, 6])
a
[2, 4, 1, [3, 5, 6]]

All List Methods

  • append()
  • clear()
  • copy()
  • count()
  • extend()
  • index()
  • insert()
  • pop()
  • remove()
  • reverse()
  • sort()

Not Everything is a Method

Some functions that you would expect to be methods are not.

Example, len() returns the length of an object.

x = [1, 2, 3]
x.len()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[33], line 2
      1 x = [1, 2, 3]
----> 2 x.len()

AttributeError: 'list' object has no attribute 'len'

But len() is not a method of a list, even though you might expect it.

x = [1, 2, 3]
len(x)
3

Python Functions

Python functions are defined with the keyword def and spacing:

def my_function():
    print("Hello!")

my_function()
Hello!

You can define arguments and return values.

def my_function(a, b):
    c = a ** 2 + b ** 2
    return c

my_function(3, 4)
25

Pandas

Working with Data

Pandas is a package that implements DataFrames in Python.

First, you need to install pandas, then import it.

terminal
pip install pandas
import pandas as pd

Pandas DataFrame Example

The construction of a DataFrame is based off of the dictionary objects.

df = pd.DataFrame(
    {
        "state": ["RI", "MA", "VT"],
        "size": ["tiny", "small", "small"],
        "snow": [5, 10, 20],
        "temp": [40, 35, 30]
    }
)
df
state size snow temp
0 RI tiny 5 40
1 MA small 10 35
2 VT small 20 30

Series

Each column of a DataFrame is a Series.

df["state"]
0    RI
1    MA
2    VT
Name: state, dtype: object

Series have their own methods.

df["snow"].max()
np.int64(20)

DataFrame Methods

Dataframes have a lot of their own methods.

describe() will summarize all numerical columns.

df.describe()
snow temp
count 3.000000 3.0
mean 11.666667 35.0
std 7.637626 5.0
min 5.000000 30.0
25% 7.500000 32.5
50% 10.000000 35.0
75% 15.000000 37.5
max 20.000000 40.0

Reading and Writing CSVs

A crucial step is reading and writing data.

Writing it out is a method:

df.to_csv("my-csv.csv")

Reading it in is a fuction:

df2 = pd.read_csv("my-csv.csv")

Data Science with Pandas

All of the data science operations we saw with dplyr and data.table are possible with Python and pandas.

  • group by
  • summarize
  • adding new columns
  • etc.

Other Python Packages

  • numpy vectors, arrays, numerical analysis
  • pandas DataFrames
  • matplotlib plotting
  • seaborn plotting with pandas DataFrames
  • plotly interactive plots
  • Scikit-Learn machine learning
  • TensorFlow Neural Nets
  • PyTorch Neural Nets, but using GPUs
  • BeautifulSoup web scraping

Live Coding Example

  • Create a python file
  • Run code in a Python terminal
  • Create a Python Environment
  • Install the numpy package

Jupyter

Jupyter Background

Developed in 2014.

It’s name is a reference to the three core programming languages it supports:

  • Julia
  • Python
  • R

Today now supports many more languages (SQL, Ruby, …).

Jupyter Products

Jupyter Notebooks

  • Interactive notebooks with code + markdown for many languages
  • Browser based editor

JupyterLab

  • Newer improved browser based editor for notebooks

JupyterHub

  • Cloud-based jupyter notebooks

Setup and Installation

Jupyter Notebooks

Requires:

  • Python installation
  • Python package: pip install jupyter

To use in VS Code, you need the “Jupyter” extension.

Using an Environment

If you are using a “.venv” environment, you’ll have to install jupyter in that environment.

In VS Code,

  • Command Palette > “Python: Create Environment…”
  • Select “venv”
  • then pip install jupyter in the terminal

Creating a Notebook

Jupyter notebooks have a unique extension: “.ipynb”

  • “interactive python notebook”

Simply create an empty file with that extension and VS Code will recognize it as a Jupyter notebook.

Kernels

Kernels

Jupyter Notebooks execute code by sending it to one of many possible “kernels”.

You can choose as your kernel:

  • Python
  • Julia
  • R
  • other languages you set up.

Using a language as a kernel requies some setup for each language.

Python Kernel

Python can be used as a Kernel once the jupyter package is installed.

I recommend using your python environment “.venv” as your kernel.

  • should be where you installed jupyter
  • keeps your packages self-contained

Julia Kernel

To use Julia as a kernel, you first need to install

  • IJulia package

Here it is probably easiest to just install IJulia system-wide

R Kernel

To use R as a kernel, you first need to install

  • IRkernel package

Here it is probably easiest to just install IRkernel system-wide.

You should also run the following in R to finish the setup: IRkernel::installspec()

Choosing a Kernel

Whenever you open a Jupyter Notebook you will be able to choose the kernel you want to use.

  • Your choice will be saved
  • You can always change kernels later
    • though that would probably break your code

Notebook Cells

Notebook Cells

Jupyter notebooks have two types of cells:

  • Markdown Cells
  • Code Cells

Markdown Cells

Markdown cells allow you to write and render markdown.

You can actually do this without any kernel attached.

All of the usual markdown formatting is allowed (headers, links, bullets, etc.)

Code Cells

Code cells are where you write code.

Each code cell can be execute individually.

Output, errors, and warnings are displayed after the individual code cell.

The .ipynb files

The “interactive python notebook” files are actually just JSON files.

JSON is a common file format.

  • stores data as arrays and key:value pairs

You can always open up a Jupyter notebook with a basic text editor.

  • You will be able to see each “cell”
  • But it will be messy

Output Included

A key feature of Jupyter notebooks is

  • the output of code cells are included in the JSON file

This means

  • you can send your file to someone else, they can open it, and see your results, without having to run the notebook
  • the notebook files can get very large
  • git diffs are a big mess

Editing and Running Jupyter Notebooks

Two options for editing Jupyter Notebooks:

  1. in the browser

  2. in VS Code

Editing Jupyter Notebooks in the browser

This is the default built in to the jupyter package.

In your workspace run:

terminal
jupyter notebook

This will launch a http server in the terminal.

  • This terminal must stay open while you are using Jupyter!

And it will open a window in your browser with the editor.

Editing Jupyter Notebooks in VS Code

Once you have installed the Jupyter VS Code extension

  • you can edit and run Jupyter Notebooks witin VS Code

Behind the scenes, VS Code will launch the kernel as a http server, send the code to it, bring back the results.

Quarto vs. Jupyter Notebooks

Both have code and markdown “chunks” (cells).

Jupyter Notebooks

  • focused on interactivitiy
  • outputs a JSON file with markdown + code + output

Quarto

  • focused on output decoument types: html, pdf, slides, etc.
  • “.qmd” files are markdown + code only

Compile Jupyter Notebooks in Quarto

Quarto can compile a Jupyter notebook into any of is output formats.

terminal
quarto render example.ipynb --to html
quarto render example.ipynb --to docx

This allows you to quickly turn your Jupyter Notebooks into pdf reports, or website pages, etc.

Use Jupyter as a Quarto Engine - Python

Instead, you can use Jupyter as the engine for Quarto (instead of R).

---
format: pdf
jupyter: python3
---

Some text

```{python}
x = 2 + 
print(x)
```

## A Header

Some text

Interactive Notebooks

List of Interactive Notebooks

  • rmarkdown R
  • Pluto.jl Julia
  • Quarto Julia, Python, R
  • Jupyter Notebooks Julia, Python, R

Jupyter’s big difference: output is included in the notebook file.

Interactive Notebook Pros

  • Documentation / thoughts right next to code
  • Make reports / slides / exciting output
  • Easy to share the output with others
  • Documents are never out of sync with the code

Interactive Notebook Cons

  • Often harder to maintain code environments
  • Introducies a lot of dependencies
    • can be hard for others to run your code
  • Encourages “single file” linear coding
    • rather than separate scripts and functions
  • Harder to run unit tests, debuggers, other software engineering tools

When to use Interactive Notebooks?

I love to use Quarto (and before that rmarkdown).

  • websites and presentations for class or Macro Breakfast
  • trying out new ideas for my research projects
    • I make PDFs of the results for my advisor and I to look at

But I think your research project should not be in a notebook.

  • You want to be able to run your project end-to-end with only the necessary dependencies
  • You want others to be able to run it easily
  • You want to be able to test/debug/optimize the code

Summary

Python Overview

  • Not that different from R
    • syntax is similar
    • but some translation required
  • Environments are even more important
  • More object-focused
    • methods
  • the . operator has a multiple meanings
    • packages and methods
  • Spacing is necessary

Jupyter Summary

  • Interactive notebook
    • kernels for Julia, Python, R, and more
  • Just a JSON file
    • Output is saved in the file
  • Edit in
    • VS Code
    • Browser
  • Can use as a Quarto engine
    • supports Python and Julia in Quarto

Live Coding Example

  • Launch an Interactive “Native” Python REPL in VS Code
  • Launch a Jupyter Notebook in VS
  • Launch a Jupyter Notebook in the browser: jupyter notebook