Practice Reading Files and Plotting Data#

Author: Dr Antonia Mey and Dr James Cumby
Email: antonia.mey@ed.ac.uk, james.cumby@ed.ac.uk

Question 1#

Read in data from extra/X_calibration.csv into a dataframe.
Inspect the dataframe you have read in.
Plot the absorbance vs. mass concentration data (in g dm$^{-3}$) of a compound X. Write a function that will plot this calibration curve (DataFrame taken as the function argument) as a scatter plot, with axes correctly labelled including units.

# Answer here

Question 2#

The radial part ($R(r)$) of the hydrogen wavefunction for the quantum numbers $n$ and $l$ is given by the equation $$ R(r) = \frac{2r}{na_0}\sum_{k=0}^{n-l-1} a_k \left( \frac{2r}{na_0} \right)^k \exp \left( {-\frac{r}{na_0}} \right) $$ where the coefficient $a_k$ is determined from a recursion relationship, $$ a_{k+1} = \frac{k+l+1-n}{(k+1)(k+2l+2)}a_k $$ and $a_0$ is the Bohr radius.

Generally more useful for visualisation is the atomic radial distribution function $$ \mathrm{RDF} = 4 \pi r^2 R(r)^2 $$ which readily shows the number of radial nodes for a given combination of $(n, l)$.

The function atomic_rdf takes as its input the quantum numbers $n$ and $l$ and a maximum radius for which to calculate the rdf for (in units of $a_0$, the Bohr radius) and returns two lists; the first of r distances (in $a_0$ units) and the second of the calculated RDF at each point.

Write a function that will take an integer $n$ as input ($1 \leq n_{\mathrm{max}} \leq 4$), and produce a plot of the RDFs for $n = 1, 2 ..., n_\mathrm{max}$ containing the wavefunction with the fewest radial nodes for each $n$. For example, $n_{\mathrm{max}} = 3$ would produce a plot of the 1s, 2p and 3d RDFs.

Remember, the number of radial nodes for a given $n,l$ pair is given by $n - l - 1$.

Your function should incorporate assert statements to check that the input value is valid, and return the matplotlib axes object. Internally, call the atomic_rdf function to generate the necessary curves. Your plot should extend from $0 a_0\leq r < 50 a_0$ and should have axes correctly labelled with the quantity plotted and any units.

Your plot should adopt the following formatting:

overall figure should have a width:height ratio of 1.618 (the ‘golden’ ratio)
The colours and line formats should be as follows:

Principal quantum number $n$

Line Colour

1

black (‘k’)

2

red (‘r’)

3

Matplotlib C1

4

Matplotlib C2

5

Matplotlib C3

6 - 10

Matplotlib C4 - C8

Angular quantum number $l$

Line Style

0 (s)

1 (p)

2 (d)

3 (f)

Principal quantum number \(n\)	Line Colour
1	black (‘k’)
2	red (‘r’)
3	Matplotlib C1
4	Matplotlib C2
5	Matplotlib C3
6 - 10	Matplotlib C4 - C8

Angular quantum number \(l\)	Line Style
0 (s)
1 (p)
2 (d)
3 (f)

# Answer here

Question 3#

The file data_sources/V_containing_ICSD.csv contains details of many of the reported crystal structures containing vanadium.

plot_icsd is intended to take a data file with the same format as V_containing_ICSD.csv, and produce a histogram based on the CellVolume column values. In addition, it takes two arguments (groupby_col and group_ranges) which should cause it to create a stacked histogram based on data in another column. For example, if groupby_col = Temperature then plot_icsd should produce a plot where histograms for different temperature ranges are stacked on top of each other. The ranges to use are defined by the group_ranges argument, which is a list containing the temperature divisions. These ranges should be exclusive of the upper value, but inclusive of the lower value, i.e.

group_ranges = [0, 290, 300, 5000] should produce three sets of bars, with the data distributed as: $$ 0 \leq T < 290 \\ 290 \leq T < 300 \\ 300 \leq T < 5000 \\ $$

By way of an example, plot_icsd('data_sources/V_containing_ICSD.csv', 'Temperature', [0, 290, 300, 5000]) should produce a plot similar to the following: Example CellVolume histogram

Points to note#

If groupby_col == '' or group_ranges is an empty list, plot_icsd should return a simple (non-stacked) histogram
You should use 50 histogram bins distributed across the full range of the CellVolume values.

Hint: to generate an certain number of bins across a range, use np.linspace(start, end, number_of_bins+1) (remember that computing a histogram requires both left and right edges of each bin, but plotting only requires the left edge)
The width of the plotted bins should be the correct size for the data

Hint: np.linspace can also return the step size if asked…
Remember to stack the bars, and not just plot them starting from zero.
Remember to add a legend and meaningful axis labels, based on the data received.

Data-Driven Chemistry

Practice Reading Files and Plotting Data

Contents

Practice Reading Files and Plotting Data#

Question 1#

Question 2#

Question 3#

Points to note#