CSCI 503 - Assignment 4

Goals

The goal of this assignment is to work with files, iterators, strings, and string formatting in Python.

Instructions

You will be doing your work in a Jupyter notebook for this assignment. You may choose to work on this assignment on a hosted environment (e.g. tiger) or on your own local installation of Jupyter and Python. You should use Python 3.8 or higher for your work. To use tiger, use the credentials you received. If you work remotely, make sure to download the .ipynb file to turn in. If you choose to work locally, Anaconda is the easiest way to install and manage Python. If you work locally, you may launch Jupyter Lab either from the Navigator application or via the command-line as jupyter-lab.

In this assignment, we will be working with data from NOAA’s Climate Data Online. This is a legacy application and uses an older text-based file format. I have downloaded monthly data for 1990 through 2019 for Illinois, available here. We will read the entire file in Part 1, but will output a subset of the fields fields related to the following:

YearMonth: the four-digit year concatenated with the two-digit month
PCP: the amount of precipitation for the month
TAVG: the average temperature in the month
PSDI: the Palmer Drought Severity Index value

A template notebook is provided with one cell to download the data. You will read this data, calculate yearly values, and write a new output file in a similar format.

Due Date

The assignment is due at 11:59pm on Tuesday, February 23.

Submission

You should submit the completed notebook file required for this assignment on Blackboard. The filename of the notebook should be a4.ipynb. You should not turn in the illinois-climate-yearly.txt file as your notebook should contain the code to create it.

Details

Please make sure to follow instructions to receive full credit. Use a markdown cell to Label each part of the assignment with the number of the section you are completing. You may put the code for each part into one or more cells.

0. Name & Z-ID (5 pts)

The first cell of your notebook should be a markdown cell with a line for your name and a line for your Z-ID. If you wish to add other information (the assignment name, a description of the assignment), you may do so after these two lines.

1. Read Data (15 pts)

Either download the file and upload it to Jupyter, or use the provided template notebook which contains a cell to download the file. Then, use an iterator to read the file illinois-climate-monthly.txt into a list of dictionaries named data (a format similar to what we used in Assignment 3). Remember that a file object will provide an iterator if you pass it to the iter function. Read the header first (the first line), and then the rest of the file. The header will serve as the keys for each dictionary while the other lines are values. Do not assign each key-value pair individually; this is tedious. You do not need to convert values (e.g. to floats) in this step (see Step 3). When you finish, data` should look like:

[{'StateCode': '11',
  'Division': '00',
  'YearMonth': '199001',
  'PCP': '2.05',
  'TAVG': '36.4',
  'PDSI': '-1.86',
  'PHDI': '-1.86',
  'ZNDX': '-.39',
  'PMDI': '-1.86',
  'CDD': '0',
  'HDD': '971',
  'SP01': '.18',
  'SP02': '-.93',
  'SP03': '-1.28',
  'SP06': '-.75',
  'SP09': '-.93',
  'SP12': '-.92',
  'SP24': '-1.91',
  'TMIN': '27.2',
  'TMAX': '45.5'},
  ...
]

Hints

Remember that the iter() functions obtains an iterator for an object, and next() retrieves an item. For example, next(iter([1,2,3])) is 1.
If you pass an iterator to a for loop, it will loop through the remaining items.
split will be useful.
Use zip to create pairs of tuples that can be used to create a dictionary. dict(zip(['a','b'], [1,2])) produces the dictionary {'a': 1, 'b': 2}.

2. Split Year and Month (5 pts)

For each data item, create two new key-value pairs, Year and Month, from the YearMonth column. Add these pairs to the existing dictionaries in the list.

Hints

You should be able to use slicing to extract the parts of the string for each data item.

3. Compute Yearly Values (15 pts)

Create a new list of dictionaries named year_data that has one entry per year that and contains four key-value pairs (Year, PCP, TAVG, PSDI). PCP should be the sum of all PCP values for that year; TAVG and PSDI should be the average of the TAVG and PSDI values for the year. You will need to convert the individual values to floats before computing sums or averages. year_data should look like:

[{'Year': 1990,
  'PCP': 49.94,
  'TAVG': 53.55833333333333,
  'PDSI': 1.9158333333333335},
 {'Year': 1991,
  'PCP': 36.120000000000005,
  'TAVG': 53.74166666666667,
  'PDSI': -1.0808333333333333},
 ...
]

Hints

Consider using a set comprehension to obtain all the years in the dataset, and list comprehensions to filter the data for each individual year before computing the sum or average.

4. Write Output (20 pts)

Write the new year_data list of dictionaries to a file named illinois-climate-yearly.txt in a similar format to the original file. This means writing the header values as the first line. Second, write the floating point values for PCP with 2 digits after the decimal point, TAVG with 1 digit after the decimal point, and PSDI with 2 digits after the decimal point. All the numbers should be right-aligned with at least two spaces between each column. Finally, the PSDI averages should be written without a leading 0. In other words, values of 0.34 or -0.65 would be written as .34 and -.65. There is no format option for not writing the leading zero in Python so you need to write code to do this transformation. The output file should look like:

Year    PCP   TAVG   PDSI 
1990  49.94   53.6   1.92
1991  36.12   53.7  -1.08
1992  35.81   51.9    .08
1993  51.18   50.9   3.76
1994  35.51   52.0   -.35
1995  38.91   51.6   -.16
1996  38.82   50.1    .63
1997  35.10   51.0   -.18
1998  44.49   55.2   1.40
1999  35.71   53.5    .08
2000  39.77   52.2   -.01
2001  39.65   53.4    .67
2002  39.06   53.2    .35
2003  37.78   51.7   -.12
2004  39.97   52.6    .79
2005  31.43   53.7  -1.89
2006  40.64   54.0   -.32
2007  36.69   53.7   -.54
2008  50.17   50.9   2.98
2009  50.96   51.4   4.71
2010  40.68   52.9   3.78
2011  46.10   53.1   1.27
2012  30.11   55.8  -2.35
2013  42.49   51.1    .42
2014  41.29   49.5   1.09
2015  48.80   52.9   2.73
2016  39.75   54.7   2.75
2017  37.80   54.4    .46
2018  45.79   52.5   1.60
2019  49.87   52.1   4.34

Hints

Use print function calls to write to stdout first, then when you are satisfied the format is correct, write to a file.
Remember the file keyword argument for the print function.
Use a with statement to make sure all data is written to the file (or make sure to call close).
Remember to pass the w flag to open to be able to write to a file.
Try converting the PSDI value to a string first, and then using string methods to deal with the leading zero.
You will need to deal with two cases for stripping the leading zero, those with and those without a negative sign. Remember to address both, and watch out that you don’t convert 10.43 to 1.43. lstrip may be helpful for this case.
You can see the contents of the file you wrote in the notebook using the !cat command: !cat illinois-climate-yearly.txt