The goal of this assignment is to work with files, iterators, strings, and string formatting in Python.
You will be doing your work in a Jupyter notebook for this assignment. You may choose to work on this assignment on a hosted environment (e.g. tiger) or on your own local installation of Jupyter and Python. You should use Python 3.8 or higher for your work. To use tiger, use the credentials you received. If you work remotely, make sure to download the .ipynb file to turn in. If you choose to work locally, Anaconda is the easiest way to install and manage Python. If you work locally, you may launch Jupyter Lab either from the Navigator application or via the command-line as jupyter-lab
.
In this assignment, we will be working with data from NOAA’s Climate Data Online. This is a legacy application and uses an older text-based file format. I have downloaded monthly data for 1990 through 2019 for Illinois, available here. We will read the entire file in Part 1, but will output a subset of the fields fields related to the following:
YearMonth
: the four-digit year concatenated with the two-digit monthPCP
: the amount of precipitation for the monthTAVG
: the average temperature in the monthPSDI
: the Palmer Drought Severity Index valueA template notebook is provided with one cell to download the data. You will read this data, calculate yearly values, and write a new output file in a similar format.
The assignment is due at 11:59pm on Tuesday, February 23.
You should submit the completed notebook file required for this assignment on Blackboard. The filename of the notebook should be a4.ipynb
. You should not turn in the illinois-climate-yearly.txt
file as your notebook should contain the code to create it.
Please make sure to follow instructions to receive full credit. Use a markdown cell to Label each part of the assignment with the number of the section you are completing. You may put the code for each part into one or more cells.
The first cell of your notebook should be a markdown cell with a line for your name and a line for your Z-ID. If you wish to add other information (the assignment name, a description of the assignment), you may do so after these two lines.
Either download the file and upload it to Jupyter, or use the provided template notebook which contains a cell to download the file. Then, use an iterator to read the file illinois-climate-monthly.txt
into a list of dictionaries named data
(a format similar to what we used in Assignment 3). Remember that a file object will provide an iterator if you pass it to the iter
function. Read the header first (the first line), and then the rest of the file. The header will serve as the keys for each dictionary while the other lines are values. Do not assign each key-value pair individually; this is tedious. You do not need to convert values (e.g. to floats) in this step (see Step 3). When you finish, data` should look like:
'StateCode': '11',
[{'Division': '00',
'YearMonth': '199001',
'PCP': '2.05',
'TAVG': '36.4',
'PDSI': '-1.86',
'PHDI': '-1.86',
'ZNDX': '-.39',
'PMDI': '-1.86',
'CDD': '0',
'HDD': '971',
'SP01': '.18',
'SP02': '-.93',
'SP03': '-1.28',
'SP06': '-.75',
'SP09': '-.93',
'SP12': '-.92',
'SP24': '-1.91',
'TMIN': '27.2',
'TMAX': '45.5'},
... ]
iter()
functions obtains an iterator for an object, and next()
retrieves an item. For example, next(iter([1,2,3]))
is 1.split
will be useful.zip
to create pairs of tuples that can be used to create a dictionary. dict(zip(['a','b'], [1,2]))
produces the dictionary {'a': 1, 'b': 2}
.For each data item, create two new key-value pairs, Year
and Month
, from the YearMonth
column. Add these pairs to the existing dictionaries in the list.
Create a new list of dictionaries named year_data
that has one entry per year that and contains four key-value pairs (Year
, PCP
, TAVG
, PSDI
). PCP
should be the sum of all PCP
values for that year; TAVG
and PSDI
should be the average of the TAVG
and PSDI
values for the year. You will need to convert the individual values to floats before computing sums or averages. year_data
should look like:
'Year': 1990,
[{'PCP': 49.94,
'TAVG': 53.55833333333333,
'PDSI': 1.9158333333333335},
'Year': 1991,
{'PCP': 36.120000000000005,
'TAVG': 53.74166666666667,
'PDSI': -1.0808333333333333},
... ]
Write the new year_data
list of dictionaries to a file named illinois-climate-yearly.txt
in a similar format to the original file. This means writing the header values as the first line. Second, write the floating point values for PCP
with 2 digits after the decimal point, TAVG
with 1 digit after the decimal point, and PSDI
with 2 digits after the decimal point. All the numbers should be right-aligned with at least two spaces between each column. Finally, the PSDI averages should be written without a leading 0. In other words, values of 0.34 or -0.65 would be written as .34 and -.65. There is no format option for not writing the leading zero in Python so you need to write code to do this transformation. The output file should look like:
Year PCP TAVG PDSI
1990 49.94 53.6 1.92
1991 36.12 53.7 -1.08
1992 35.81 51.9 .08
1993 51.18 50.9 3.76
1994 35.51 52.0 -.35
1995 38.91 51.6 -.16
1996 38.82 50.1 .63
1997 35.10 51.0 -.18
1998 44.49 55.2 1.40
1999 35.71 53.5 .08
2000 39.77 52.2 -.01
2001 39.65 53.4 .67
2002 39.06 53.2 .35
2003 37.78 51.7 -.12
2004 39.97 52.6 .79
2005 31.43 53.7 -1.89
2006 40.64 54.0 -.32
2007 36.69 53.7 -.54
2008 50.17 50.9 2.98
2009 50.96 51.4 4.71
2010 40.68 52.9 3.78
2011 46.10 53.1 1.27
2012 30.11 55.8 -2.35
2013 42.49 51.1 .42
2014 41.29 49.5 1.09
2015 48.80 52.9 2.73
2016 39.75 54.7 2.75
2017 37.80 54.4 .46
2018 45.79 52.5 1.60
2019 49.87 52.1 4.34
print
function calls to write to stdout
first, then when you are satisfied the format is correct, write to a file.print
function.with
statement to make sure all data is written to the file (or make sure to call close
).w
flag to open
to be able to write to a file.lstrip
may be helpful for this case.!cat
command: !cat illinois-climate-yearly.txt