The goal of this assignment is to work with files, iterators, strings, and string formatting in Python.
You will be doing your work in a Jupyter notebook for this assignment. You may choose to work on this assignment on a hosted environment (e.g. tiger) or on your own local installation of Jupyter and Python. You should use Python 3.8 or higher for your work. To use tiger, use the credentials you received. If you work remotely, make sure to download the .ipynb file to turn in. If you choose to work locally, Anaconda is the easiest way to install and manage Python. If you work locally, you may launch Jupyter Lab either from the Navigator application or via the command-line as jupyter-lab
.
In this assignment, we will be working with data from U.S. Department of Agriculture’s Economic Research Service about weekly food sales in 39 states (others are not available due to how the data is collected). I have downloaded and pre-processed the data here, converting it to a fixed-width file format. There are five fields, whose locations in each line are as follows:
Date
: (0-10) the end of the week of the data collection in yyyy-mm-dd formatState
: (12-26) the name of the state (contains spaces)Category
: (28-55) the food category (contains spaces)Dollars
: (57-66) the total value of salesLastYear
: (68-77) the total value of sales last yearA template notebook is provided with one cell to download the data. You will read this data, calculate monthly averages, and write a new output file in a similar format.
The assignment is due at 11:59pm on Thursday, October 7.
You should submit the completed notebook file required for this assignment on Blackboard. The filename of the notebook should be a4.ipynb
. You should not turn in the food-prices-monthly.txt
file as your notebook should contain the code to create it.
Please make sure to follow instructions to receive full credit. Use a markdown cell to Label each part of the assignment with the number of the section you are completing. You may put the code for each part into one or more cells. In Parts 3 and 4, CSCI 503 students must compute and output two averages and a percent change while CSCI 490 students need only compute and output one average. Do not use external libraries for reading, parsing, or writing the data files.
The first cell of your notebook should be a markdown cell with a line for your name and a line for your Z-ID. If you wish to add other information (the assignment name, a description of the assignment), you may do so after these two lines.
Either download the file and upload it to Jupyter, or use the provided template notebook which contains a cell to download the file. Then, use an iterator to read the file food-prices-weekly.txt
into a list of dictionaries named data
(a format similar to what we used in Assignment 3). Remember that a file object will provide an iterator if you pass it to the iter
function. Read the header first (the first line), and then the rest of the file. The header will serve as the keys for each dictionary while the other lines are values. To split each line into its column values, use slicing and remove leading and trailing whitespace. You do not need to convert values (e.g. to integers) in this step (see Step 3). When you finish, your data should look like:
[
...,'Date': '2021-08-22',
{'State': 'Wyoming',
'Category': 'Vegetables',
'Dollars': '621511',
'LastYear': '620174'}
]
iter()
functions obtains an iterator for an object, and next()
retrieves an item. For example, next(iter([1,2,3]))
is 1.strip
will be useful to remove whitespace.zip
to create pairs of tuples that can be used to create a dictionary. dict(zip(['a','b'], [1,2]))
produces the dictionary {'a': 1, 'b': 2}
.For each data item, create a new key-value pair, Month
, from the Date
column. Given a date in yyyy-mm-dd format, the Month
value should be yyyy-mm. Add these pairs to the existing dictionaries in the list.
CSCI 490 students should complete (a) and CSCI 503 students should complete (b)
Create a new list of dictionaries named monthly_data
that has one entry per month that and contains four key-value pairs (Month
, State
, Category
, DollarsAvg
). DollarsAvg
should be the average of the Dollars
values for the given month, state, and category. You will need to convert the individual values to integers before computing averages. monthly_data
should look like:
[
...,'Month': '2021-08',
{'State': 'Wyoming',
'Category': 'Vegetables',
'Dollars': 619689.75}
]
Create a new list of dictionaries named monthly_data
that has one entry per month that and contains four key-value pairs (Month
, State
, Category
, DollarsAvg
, LastYearAvg
, PctChange
). DollarsAvg
and LastYearAvg
should be the average of the Dollars
and LastYear
values, respectively, for the given month, state, and category. Then, PctChange
is computed by \[
100 \cdot \frac{\texttt{DollarsAvg} - \texttt{LastYearAvg}}{\texttt{DollarsAvg}}
\] You will need to convert the individual values to integers before computing averages. Do this efficiently. You shouldn’t loop through the data more than twice (once to read, once to write). At the end, monthly_data
should look like:
[
...,'Month': '2021-08',
{'State': 'Wyoming',
'Category': 'Vegetables',
'DollarsAvg': 619689.75,
'LastYearAvg': 640338.5,
'PctChange': -3.332110947453948}
]
DollarsAvg
and LastYearAvg
at the same time, and create the dictionary for each monthly_data
entry by accessing all data related to the (Month, State, Category) tuple at once.Write the new monthly_data
list of dictionaries to a file named food-prices-monthly.txt
in a similar format to the original file. This means writing the header values as the first line. Second, write the floating point values for DollarsAvg
(and for CSCI 503, LastYearAvg
) with only 1 digit after the decimal point. CSCI 503 students should also write PctChange
with 2 digits after the decimal point. All the numbers should be right-aligned with at least two spaces between each column. Write a positive sign (+
) for positive values of PctChange
and a negative sign for negative values. The columns should be:
Month
: (0-7) the month of the data collection in yyyy-mm formatState
: (9-23) the name of the state (contains spaces)Category
: (25-52) the food category (contains spaces)DollarsAvg
: (54-65) the total value of salesLastYearAvg
: (67-78) the total value of sales last yearPctChange
: (80-89) the percent change from the last yearThe output file should look like (CSCI 490 students will not have the last two columns):
Month State Category DollarsAvg LastYearAvg PctChange
2019-10 Alabama Alcohol 22064525.5 21262833.8 +3.63
2019-11 Alabama Alcohol 21828656.0 20955901.0 +4.00
2019-12 Alabama Alcohol 22924922.8 19847999.6 +13.42
... ... ... ... ... ...
2021-06 Wyoming Vegetables 619151.5 661495.0 -6.84
2021-07 Wyoming Vegetables 643430.0 693040.8 -7.71
2021-08 Wyoming Vegetables 619689.8 640338.5 -3.33
print
function calls to write a few data items to stdout
first, then when you are satisfied the format is correct, write all of the data items to a file.print
function.with
statement to make sure all data is written to the file (or make sure to call close
).w
flag to open
to be able to write to a file.!cat
command: !cat food-prices-monthly.txt