Python

Introduction to Python

Matthew DeHaven

March 31, 2024

Course Home Page

Lecture Summary

Python Overview

  • Not that different from R
    • syntax is similar
    • but some translation required
  • Environments are even more important
  • More object-focused
    • methods
  • the . operator has a multiple meanings
    • packages and methods
  • Spacing is necessary

Python Installs

Installing Python

Python comes with many operating systems, but to get the latest version…

You want to always have the latest version, but this leads to many installations of Python on the same computer.

Each of these installations are referred to as “interpreters”.

Choosing a Python Interpreter

VS Code allows you to choose a Python Interpreter when editing a “.py” file.

Running Python in VS Code

You can run a whole file of Python code by clicking the play arrow.

Or you can run a single line or selection…

  • hitting Shift+Enter
  • right clicking the line, then selecting “Run Python”

Both of these will execute the Python code in a terminal.

Hello World Example

x = 'Hello Class!'
print(x)

a = 0.1
b = 0.1
c = 0.1

print(a + b + c == 0.3)
Hello Class!
False

Python Environments

Environments

There are two main options for setting up Python environments:

  1. venv

Comes with the latest Python installations, similar to renv for R, creates a folder with symlinks to the packages.

  1. conda

Part of the Anaconda/miniconda world.

Can be used both as a package manager and for environments.

Aside on Conda

Conda manages both Python installations, packages, and environments from outside Python.

Venv manages Python environments from within Python.

Anaconda is a distribution of (1) a Python installation, (2) Conda environments, (3) a bunch of default packages.

Miniconda is a distribution of (1) a Python installation and (2) Conda environments.

Use Python Environments

You should be using environments for any language, but especially for Python.

  • You will have multiple versions of Python installed at once
  • Python package managers historically handled dependencies poorly
    • Updating one package would break another package
  • The latest versions have started giving warning messages if you try to install packages system-wide

Creating an Environment

For venv the command to create a virtual environment is…

terminal
python -m venv /path/to/new/virtual/environment

And for conda

terminal
conda create --name <my-env>

Creating an Environment in VS Code

Luckily VS Code’s Python Extension makes handling these environments easy.

  • Open the Command Palette Shift+Cmd+P

  • Search for “Python: Create Environment…”

  • Select either “venv” or “conda”

  • Select the Python interpreter (version) to use

Now whenever you launch a terminal for this workspace, it wil use the environment you created.

Python Packages

Python Packages

Packages are how you can import functions.

You will sometimes see “Modules”. A package could have one or many modules within it.

Packages can be installed using

  • pip built in to Python
  • conda

Pip Install Packages

pip stands for “pip installs packages”.

It installs Python packages hosted on the Python Package Index (PyPI).

terminal
pip install numpy

pip install numpy is executed in a terminal, not in Python code itself, unlike R or Julia.

Conda Install Packages

Installs packages hosted on the Anaconda repository.

Can also install other software, like R.

terminal
conda install numpy

You can also use pip install to install packages in a conda environment, but this can cause conflicts, so you should use conda install by default in this situation.

Importing a Package

Once a package is installed (to your environment), you can…

  1. Import the package and shorten the name
import numpy as np

numpy is a package for numerical computation (ex. better arrays and linear algebra).

If you want to use the function arange() from numpy, you would write

np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Importing a Package, Other options

  1. Import the package without shortening the name
import numpy
numpy.arange(10)
  1. Directly import one function of the package
from numpy import arange
arange(10)
  1. Directly import all functions of the package
from numpy import *
arange(10)

The first option is what is recommended and used most often.

Python Basics

Variable Assignment

Python uses a single = for assignment

x = 42

Just like R, functions can be assigned to new variables:

a = np.arange
a(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Basic Math

Math is not that different:

3 + 3
6
3 - 3
0
3 * 3
9
3 / 3
1.0
3 ** 3 # Power
27
3 % 3  # Modulo
0

Logic

Logic operators are written out instead of symbols.

1 > 2
False
1 < 2
True
1 > 2 and 1 < 2
False
1 > 2 or 1 < 2
True
not 1 > 2
True

Python Data Types

  • Booleans
  • Numbers: Integer, Floating
  • Strings
  • Collections:
    • List, Tuple, Dictionary, Set

Lists

Lists are constructed as comma-separated elements in square brackets

my_list = [1, 5, 2, 8]
print(my_list)
[1, 5, 2, 8]

Lists don’t enforce a single type.

my_list = [1, 'hi', 2, False]
print(my_list)
[1, 'hi', 2, False]

Tuples

Tuples are constructed as comma-separated elements in parentheses.

my_tuple = (1, 'hi', 2, False)
print(my_tuple)
(1, 'hi', 2, False)

Tuples are immutable; lists are mutable.

  • i.e. you cannot change values of tuples once created, or add additional elements
my_list[2] = 4
print(my_list)
[1, 'hi', 4, False]
my_tuple[2] = 4
TypeError: 'tuple' object does not support item assignment

Sets

Sets are constructed as selements in curly braces.

my_set = {"RI", "MA", "VT"}
my_set
{'MA', 'RI', 'VT'}

Sets are only unique values.

my_set  = {"RI", "RI", "MA", "VT", "MA"}
my_set
{'MA', 'RI', 'VT'}

Dictionaries

Dictionaries are constructed as key:value pairs in curly braces.

my_dict = {"RI":"Rhode Island", "MA":"Massachusetts", "VT":"Vermont"}
print(my_dict)
{'RI': 'Rhode Island', 'MA': 'Massachusetts', 'VT': 'Vermont'}

Dictionaries can be subsetted by their keys.

my_dict["RI"]
'Rhode Island'

Python Indexing

Python starts indexing from 0.

x = ['a', 'b', 'c']
x[0]
'a'

For some people, this is the mark of a true programming language.

For Loops

Python is space sensitive.

For example, a for loop requires the looped lines to be offset by at least one space (customary to use a tab—4 spaces).

for i in np.arange(3):
    print(i)
0
1
2

Leaving out the space will trhough an error.

for i in np.arange(3):
print(i)
IndentationError: expected an indented block after 'for' statement on line 1 (172521851.py, line 2)

Objects

Python is an object-oriented programming language.

Example: a list is an object.

Objects have

  1. Properties
  2. Methods

Classes define objects; objects are the actual instance of the class.

Methods

Methods are a key feature of Python.

Methods are functions attached to an object that operate on the object.

For instance, we said a list is an object. Lists have the method: sort().

a_list = [2, 1, 9, 4, 6]
a_list.sort()
print(a_list)
[1, 2, 4, 6, 9]

Methods are Functions

Remember: methods are a type of function.

You can think of methods as functions that always take as an input the object they are defined for.

They may take other inputs as well.

Every object has their own methods. The sort() method is not defined for tuples.

a_tuple = (4, 1, 3, 2)
a_tuple.sort()
AttributeError: 'tuple' object has no attribute 'sort'

Append Method

If you wanted to append an element to a list…

a = [2, 4, 1]
a.append(3)
a
[2, 4, 1, 3]

If you appended a whole list…

a = [2, 4, 1]
a.append([3, 5, 6])
a
[2, 4, 1, [3, 5, 6]]

All List Methods

  • append()
  • clear()
  • copy()
  • count()
  • extend()
  • index()
  • insert()
  • pop()
  • remove()
  • reverse()
  • sort()

Not Everything is a Method

Some functions that you would expect to be methods are not.

Example, len() returns the length of an object.

x = [1, 2, 3]
x.len()
AttributeError: 'list' object has no attribute 'len'

But len() is not a method of a list, even though you might expect it.

x = [1, 2, 3]
len(x)
3

Python Functions

Python functions are defined with the keyword def and spacing:

def my_function():
    print("Hello!")

my_function()
Hello!

You can define arguments and return values.

def my_function(a, b):
    c = a ** 2 + b ** 2
    return c

my_function(3, 4)
25

Pandas

Working with Data

Pandas is a package that implements DataFrames in Python.

First, you need to install pandas, then import it.

terminal
pip install pandas
import pandas as pd

Pandas DataFrame Example

The construction of a DataFrame is based off of the dictionary objects.

df = pd.DataFrame(
    {
        "state": ["RI", "MA", "VT"],
        "size": ["tiny", "small", "small"],
        "snow": [5, 10, 20],
        "temp": [40, 35, 30]
    }
)
df
state size snow temp
0 RI tiny 5 40
1 MA small 10 35
2 VT small 20 30

Series

Each column of a DataFrame is a Series.

df["state"]
0    RI
1    MA
2    VT
Name: state, dtype: object

Series have their own methods.

df["snow"].max()
20

DataFrame Methods

Dataframes have a lot of their own methods.

describe() will summarize all numerical columns.

df.describe()
snow temp
count 3.000000 3.0
mean 11.666667 35.0
std 7.637626 5.0
min 5.000000 30.0
25% 7.500000 32.5
50% 10.000000 35.0
75% 15.000000 37.5
max 20.000000 40.0

Reading and Writing CSVs

A crucial step is reading and writing data.

Writing it out is a method:

df.to_csv("my-csv.csv")

Reading it in is a fuction:

df2 = pd.read_csv("my-csv.csv")

Data Science with Pandas

All of the data science operations we saw with dplyr and data.table are possible with Python and pandas.

  • group by
  • summarize
  • adding new columns
  • etc.

Other Python Packages

  • numpy vectors, arrays, numerical analysis
  • pandas DataFrames
  • matplotlib plotting
  • Seaborn plotting with pandas DataFrames
  • plotly interactive plots
  • Scikit-Learn machine learning
  • TensorFlow Neural Nets
  • PyTorch Neural Nets, but using GPUs
  • BeautifulSoup web scraping

Summary

Python Overview

  • Not that different from R
    • syntax is similar
    • but some translation required
  • Environments are even more important
  • More object-focused
    • methods
  • the . operator has a multiple meanings
    • packages and methods
  • Spacing is necessary