Unit 03: Loops, Pandas and Simple Plotting I
Contents
Unit 03: Loops, Pandas and Simple Plotting I#
Author: Dr Claire Hobday
Email: claire.hobday@ed.ac.uk
Learning objectives:#
By the end of this unit, you should be able to
use in-built functionality in Python
import modules and libraries
use the
math
module to do some simple scientific computing tasksdevelop more
pandas
skills to deal with large volumes of datause logical operations to filter data
understand and use the different types of loops to do repetitive tasks including:
for
if
else
/elif
while
break
combine these tools to analyse data in a large file containing information about the periodic table
use tools from Python to understand trends in the periodic table
Some of the material was adapted from Python4Science, as well as Software Carpentries.
Table of Contents#
Loops
3.1 For Loops
3.2 Tasks
3.3 Conditional Loops
3.4 Tasks
Links to documentation#
You can find useful information about using math
and pandas
at
Jupyter Cheat Sheet
To run the currently highlighted cell and move focus to the next cell, hold ⇧ Shift and press ⏎ Enter;
To run the currently highlighted cell and keep focus in the same cell, hold ⇧ Ctrl and press ⏎ Enter;
To get help for a specific function, place the cursor within the function’s brackets, hold ⇧ Shift, and press ⇥ Tab;
1. Working with Libraries #
1.1 Importing Libraries #
Most of the power of a programming language is in its libraries.#
A library is a collection of files (called modules) that contains functions for use by other programs.
May also contain data values (e.g. numerical constants) and other things. A library’s contents are supposed to be related, but there’s no way to enforce that.
The Python standard library is an extensive suite of modules that comes with Python itself.
Many additional libraries are available from anaconda or PyPI (the Python Package Index, see above for links).
A program must import a library module before using it.#
Use
import
to load a library module into a program’s memory.Then refer to things from the module as
module_name.thing_name
.Python uses
.
to mean “part of”.
We will be using a library called
pandas
Import specific items from a library module to shorten programs.#
Use
from ... import ...
to load only specific items from a library module.Then refer to them directly without library name as prefix.
Run the cell below to import all the required libraries for this unit:
import numpy as np
import matplotlib.pyplot as plt
# Make the helper functions accessible
import sys
import os.path
sys.path.append(os.path.abspath('../'))
from helper_functions.mentimeter import Mentimeter
Quick aside on printing variables#
To print variables together with strings, we can use f-strings.
The structure of f-strings is as follows:
my_variable = 4
print(f"some text before a variable: {my_variable}")
This prints the following:
some text before a variable: 4
Run the below cell to import \(\cos\) and \(\pi\) from the library math
:
# We are importing the functions cos and the value pi from the library math
from math import cos, pi
print(f"cos(pi) is {cos(pi)}")
Create an alias for a library module when importing it to shorten programs.#
Use
import ... as ...
to give a library a short alias while importing it.Then refer to items in the library using that shortened name.
import math as m
print(f"cos(pi) is {m.cos(m.pi)}")
Tasks 1#
Rearrange the following statements so that a random DNA base is printed and its index in the string. Remember that you’ve already imported math above! You can check it in an empty cell below to understand what it is doing.
from IPython.display import IFrame
IFrame("https://parsons.herokuapp.com/puzzle/7cf55d16a0454de580f31418505f3b54", width=1000, height=400)
# Test out the code in this cell once you have the right order!
Click here to see solution to Task 1.1
import math
import random
bases = "ACTTGCTTGAC"
n_bases = len(bases)
idx = random.randrange(n_bases)
print(f"Random base {bases[idx]} base index {idx}.")
Fill in the blanks so that the program below prints
90.0
.Rewrite the program so that it uses
import
withoutas
.Which form do you find easier to read?
# Question 1
import math as m
angle = # FIXME.degrees(# FIXME.pi / 2)
print(# FIXME)
# Question 2
Click here to see solution to Task 1.2
Filling in the right variables:
import math as m
angle = m.degrees(m.pi / 2)
print(angle)
Re-writing the program without an import as:
import math
angle = math.degrees(math.pi / 2)
print(angle)
Explanation:
Since you just wrote the code and are familiar with it, you might actually find the first version easier to read. But when trying to read a huge piece of code written by someone else, or when getting back to your own huge piece of code after several months, non-abbreviated names are often easier, except where there are clear abbreviation conventions.
Import the exponential function from the math libary.
Use it to work out \(e^{10}\).
Import a function from math which will allow you to raise a number to a power of your choice.
Raise \(6^5\)
# FIXME
Click here to see solution to Task 1.3
Filling in the right variables:
#1
from math import exp
#2
number= exp(10)
print(number)
#3
from math import pow
#4
powered = pow(6,5)
print(powered)
Explanation:
The first part should be relatively straightforward. For part 3, you may need to google the math library and read the documentation to find the function that you need to raise a number to the power of another number.
Most of the power of a programming language is in its libraries.
A program must import a library module in order to use it.
Import specific items from a library to shorten programs.
Create an alias for a library when importing it to shorten programs.
2. Working with the Pandas Library #
In Unit 02, we looked at a couple of different ways of opening files, in this session we are going to exclusively use Pandas
.
Pandas is a library in Python that works much like Excel, but we have the added advantage of being able to manipulate the data in a programmatic way.
The community-agreed alias for pandas
is .pd
, so loading pandas as pd
is assumed standard practice for all of the pandas documentation.
import pandas as pd
Import the data#
Now, we need to import the data into what pandas calls a DataFrame
, which takes the input data and formats it as a sort of “spreadsheet” with this form:
# Use pandas to read the csv file:
data = pd.read_csv("files/ptable.csv")
# View the imported dataframe, note how the index column "element" is in bold:
data
Now that we’ve imported this data, we will learn some more fundamental python concepts in order to examine the data at the end of the session.
Accessing the dataframe #
We are now going to try and view the dataframe in different ways
data.head()
shows us the first 5 lines of the dataframe…. Note how python counts from 0.data.tail()
shows us the last 5 lines of the dataframe.data.columns
lists us all the column headers which are properties associated with the elements.
Test them out below:
data.head()
data.tail()
data.columns
We might also be interested in knowing what the datatypes are for the columns.
print(f"Period is data type {data['Period'].dtype}")
It is also possible to change the datatype of a column in a dataframe using .astype()
function.
data["Period"] = data["Period"].astype(float)
print(f"Period is data type {data['Period'].dtype}")
3. Loops#
A loop is used for iterating over a set of statements. There are many different kinds of loops that can be useful in different situations. We are going to go through some of the most common types of loops.
3.1 For loops#
This loop is used for iterating over some kind of a sequence. That can be a list, tuple, dictionary, string, etc.
Everything that is inside the for statement is going to be executed a number of times.
If we think about how we would define a for
loop in python it would have the structure:
for variable in iterable:
statement(s)
where:
variable
is a variableiterable
is a collection of objects such as a list or tuplestatement(s)
in the loop body (denoted by the indent) are executed once for each item in iterable.
The loop variable variable
takes on the value of the next element in iterable
each time through the loop, until we have iterated through all items in iterable
.
Let’s take a look at some simple examples that show us how powerful for
loops can be, and how they must be properly structured to be interpreted by Python.
Example 1#
The first line of the for
loop must end with a colon, and the body must be indented.
for number in [2,3,5]:
print(number)
This for
loop is equivalent to:
print(2)
print(3)
print(5)
We can see that the for
loop is a much more efficient way of doing this task, than having to type of print(value)
.
# FIXME
for number in [2,3,5]:
print(number)
# FIXME
for number in [2,3,5]
print(number)
Example 2#
Loop variables can be called anything. So please try to make them be as meaningful as possible
for kitten in [2, 3, 5]:
print(kitten)
for numbers in [2,3,5]:
print(numbers)
Example 3#
The body of a loop can contain many statements. However, its best practise to keep a loop to no more than a few lines long.
primes = [2, 3, 5]
for p in primes:
squared = p ** 2
cubed = p ** 3
print(p, squared, cubed)
Example 4#
Use range
to iterate over a sequence of numbers.
The built-in function range
produces a sequence of numbers.
Not a list: the numbers are produced on demand to make looping over large ranges more efficient. Its easier than typing [2,3,5,7,9,11,13]
like we have done in above examples.
range(N)
is the numbers 0..N-1
Exactly the legal indices of a list or character string of length N
e.g. range(5)
would be 0,1,2,3,4
print("a range is not a list: range(0, 3)")
for number in range(0, 3):
print(number)
a range is not a list: range(0, 3)
0
1
2
Example 5#
The Accumulator pattern turns many values into one.
A common pattern in programs is to:
Initialize an accumulator variable to zero, the empty string, or the empty list.
Update the variable with values from a collection.
# Sum the first 10 integers.
total = 0
for number in range(10):
total = total + (number + 1)
print(total)
Read
total = total + (number + 1)
as:Add 1 to the current value of the loop variable
number
.Add that to the current value of the accumulator variable
total
.Assign that to
total
, replacing the current value.We have to add
number + 1
becauserange
produces0..9
, not1..10
.
Tasks 2 #
Fill in the blanks
# Total length of the strings in the list: ["red", "green", "blue"] => 12
total = 0
for word in ["red", "green", "blue"]:
____ = ____ + len(word) # FIXME
print(total)
Click here to see solution to Task 2.1
total = 0
for word in ["red", "green", "blue"]:
total = total + len(word)
print(total)
Fill in the blanks
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
lengths = ____ # FIXME
for word in ["red", "green", "blue"]:
lengths.____(____) # FIXME
print(lengths)
Click here to see solution to Task 2.2
lengths = []
for word in ["red", "green", "blue"]:
lengths.append(len(word))
print(lengths)
Fill in the blanks
# Concatenate all words: ["red", "green", "blue"] => "redgreenblue"
words = ["red", "green", "blue"]
result = ____ # FIXME
for ____ in ____: # FIXME
____ # FIXME
print(result)
Click here to see solution to Task 2.3
words = ["red", "green", "blue"]
result = ""
for word in words:
result = result + word
print(result)
Start out with an empty string acronym=""
.
Generate a loop that uses the words ‘red’, ‘green’, ‘blue’ and the function upper()
that by the end of the loop the acronym contains “RBG” when you type print(acronym)
# FIXME
acronym = ""
print(acronym)
Click here to see solution to Task 2.4
acronym = ""
for word in ["red", "green", "blue"]:
acronym = acronym + word[0].upper()
print(acronym)
Reorder and properly indent the lines of code below so that they print a list with the cumulative sum of data. The result should be [1, 3, 5, 10]
.
# FIXME
cumulative.append(sum)
for number in data:
cumulative = []
sum += number
sum = 0
print(cumulative)
data = [1,2,2,5]
Click here to see solution to Task 2.5
data = [1,2,2,5]
cumulative = []
sum = 0
for number in data:
sum += number
cumulative.append(sum)
print(cumulative)
Read the code below and try to identify what the errors are without running it.
Run the code and read the error message. What type of
NameError
do you think this is? Is it a string with no quotes, a misspelled variable, or a variable that should have been defined but was not?Fix the error.
Repeat steps 2 and 3, until you have fixed all the errors.
for number in range(10):
# use a if the number is a multiple of 3, otherwise use b
if (Number % 3) == 0:
message = message + a
else:
message = message + "b"
print(message)
Click here to see solution to Task 2.6
message = ""
for number in range(10):
# use a if the number is a multiple of 3, otherwise use b
if (number % 3) == 0:
message = message + "a"
else:
message = message + "b"
print(message)
Explanation:#
The variable message
needs to be initialized and Python variable names are case sensitive: number
and Number
refer to different variables.
A for loop executes commands once for each value in a collection.
A
for
loop is made up of a collection, a loop variable, and a body.The first line of the
for
loop must end with a colon, and the body must be indented.Indentation is always meaningful in Python.
Loop variables can be called anything (but it is strongly advised to have a meaningful name to the looping variable).
The body of a loop can contain many statements.
Use
range
to iterate over a sequence of numbers.The Accumulator pattern turns many values into one.
/div>
3.2 Conditional Loops #
Computer programming is often referred to as a “language” and often we use similar nomenclature to traditional languages. Here we will discover how conditional loops
are interpretted by Python. Conditionals are used much like the tense in languages to speculate about what could happen with respect to an if clause.
E.g. If it rains, take an umbrella. Or, if the pH is below 7, it’s acidic.
Notice how the first phrase controls the content of the second phrase.
E.g. If it’s sunny, wear sunscreen. Or if the pH is above 7, it’s basic.
We could take this analogy and allow more options, e.g. if the pH is above 7, it’s basic. Otherwise (or else) it’s acidic. Notice how we can categories the information by these conditional statements.
We can use if
statements to allow our computer programs to do different things for different data.
Use if
statements to control whether or not a block of code is executed#
Example 6 - if
#
Use an if
statement to control whether or not a block of code is executed.
An
if
statement (more properly called a conditional statement) controls whether some block of code is executed or not.Structure is similar to a
for
statement:First line opens with
if
and ends with a colonBody containing one or more statements is indented (usually by 4 spaces or a tab)
mass = 3.54
if mass > 3.0:
print(f"{mass} is large")
mass = 2.07
if mass > 3.0:
print(f"{mass} is large")
Things that you should notice:
The importance of ending the first line of the
for
loop in a colon.How the computer does not return anything in the second code block as it does not meet the
if
statement criteria.
Example 7 - if
#
Conditionals are often used inside loops.
Not much point using a conditional when we know the value (as above).
But useful when we have a collection to process.
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for mass in masses:
if mass > 3.0:
print(f"{mass} is large")
Example 8 - if
and else
#
Use else
to execute a block of code when an if
condition is not true.
else
can be used following anif
.Allows us to specify an alternative to execute when the
if
statement criterie is not met.
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for mass in masses:
if mass > 3.0:
print(f"{mass} is large")
else:
print(f"{mass} is small")
Example 9 - if
and elif
#
Use elif
to specify additional tests.
May want to provide several alternative choices, each with its own test.
Use
elif
(short for “else if”) and a condition to specify these.Always associated with an
if
.Must come before the
else
(which is the “catch all”).
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for mass in masses:
if mass > 9.0:
print(f"{mass} is HUGE")
elif mass > 3.0:
print(f"{mass} is large")
else:
print(f"{mass} is small")
Example 10 - order of conditions#
Conditions are tested once, in order.
Python steps through the branches of the conditional in order, testing each in turn.
So ordering matters.
grade = 85
if grade >= 70:
print("grade is C")
elif grade >= 80:
print("grade is B")
elif grade >= 90:
print("grade is A")
We can see here that our condition is met in the first conditional if
statement, so none of the elif
statements are evaluated.
Example 11 - using conditionals to evolve the values of variables#
In the example below we use if
and else
within a for
loop in order to change the value of velocity
.
Notice:
the indent for the
for
loop and also for theif
andelse
statements.the use of the colon at the end of
for
,if
andelse
statements.The program must have a
print
statement outside the body of the loop to show the final value of velocity, since its value is updated by the last iteration of the loop.
velocity = 10.0
# Execute the loop 5 times
for i in range(5):
print(f"try {i}:{velocity}")
if velocity > 20.0:
print("moving too fast")
velocity = velocity - 5.0
else:
print("moving too slow")
velocity = velocity + 10.0
print(f"final velocity: {velocity}")
Tasks 3 #
Fill in the blanks so that this program creates a new list containing zeroes where the original list’s values were negative and ones where the original list’s values were positive.
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = ____ # FIXME
for value in original:
if ____: # FIXME
result.append(0)
else:
____ # FIXME
print(result)
Output should look like this:
[0, 1, 1, 1, 0, 1]
Click here to see solution to Task 3.1
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = []
for value in original:
if value<0.0:
result.append(0)
else:
result.append(1)
print(result)
Modify this program so that it finds the largest and smallest values in the list no matter what the range of values originally is.
What are the advantages and disadvantages of using this method to find the range of the data?
values = [...some test data...] # FIXME
smallest, largest = None, None
for v in values:
if ____: # FIXME
smallest, largest = v, v
____: # FIXME
smallest = min(____, v) # FIXME
largest = max(____, v) # FIXME
print(smallest, largest)
Click here to see solution to Task 3.2
values = [-2,1,65,78,-54,-24,100]
smallest, largest = None, None
for v in values:
if smallest==None and largest==None:
smallest, largest = v, v
else:
smallest = min(smallest, v)
largest = max(largest, v)
print(smallest, largest)
Use
if
statements to control whether or not a block of code is executed.Conditionals are often used inside loops.
Use
else
to execute a block of code when anif
condition is not true.Use
elif
to specify additional tests.Conditions are tested once, in order.
Create a table showing variables’ values to trace a program’s execution.