The goal of this assignment is to work with lists and dictionaries in Python.
You will be doing your work in a Jupyter notebook for this
assignment. You may choose to work on this assignment on a hosted
environment (e.g. tiger)
or on your own local installation of Jupyter and Python. You should use
Python 3.10 or higher for your work, but versions 3.9+ should work for
this assignment. To use tiger, use the credentials you received. If you
work remotely, make sure to download the .ipynb file to turn in. If you
choose to work locally, Anaconda is the easiest way
to install and manage Python. If you work locally, you may launch
Jupyter Lab either from the Navigator application or via the
command-line as jupyter-lab
.
In this assignment, we will be working with data from the United States Department of Transportation’s Border Crossing Entry Data. This dataset counts all traffic coming through the ports at the Canadian and Mexican borders. Rather than using this dataset directly, I have created a subset of this data, which can be read as a list of dictionaries. That data is located here, but you do not need to download it. I have created a template notebook, a3.ipynb, that contains a cell that will download and read that data. You can right-click and save-as the a3.ipynb file, and, if working on tiger, upload that file. Once loaded, the data is a list of dictionaries where each dictionary has nine key-value pairs. Those keys and a brief description are:
Port Name
: a name for the port, often associated with
its locationState
: the state the port is located inPort Code
: a unique numeric identifier for the
portBorder
: the country whose border the port is for
(Canada or Mexico)Date
: the month and year the data when the data was
collected, as a stringMeasure
: the type of conveyance/container/person being
countedValue
: the value of the measure for the specified
monthYou will be answering queries and writing functions to help analyze this data. You may not use external libraries including statistics, collections, datetime, or pandas for this assignment.
The assignment is due at 11:59pm on Wednesday, September
28 Friday, September 30.
You should submit the completed notebook file required for this
assignment on Blackboard. The
filename of the notebook should be a3.ipynb
.
Please make sure to follow instructions to receive full credit. Use a markdown cell to Label each part of the assignment with the number of the section you are completing. You may put the code for each part into one or more cells. Students in CSCI 490 need to complete parts 1-5, and students in CSCI 503 need to complete all parts.
The first cell of your notebook should be a markdown cell with a line for your name and a line for your Z-ID. If you wish to add other information (the assignment name, a description of the assignment), you may do so after these two lines.
Find all of the possible values for the measure in the dataset. List each type only once!
Measure
from each element (which is a dictionary)Write code to find the port in the dataset with the largest monthly
count for pedestrians. Output the Port
Name and State of the port. Remember that you
will need to iterate through each element of the list, and each element
is a dictionary which has various keys including Measure
and Value
.
There are two ports with the same Port Name
. What is the
name of the port, and what are the two states and port codes that share
this name?
Write code to find the number of ports per state. Note that this is not the same as counting the number of entries in the dataset for each state. Consider using a two-step process: first, find all of the unique ports with their states, and then use that result to count the number of ports in each state.
count_letters
example from class may be useful for
the second partThe format of the data is not ideal for some types of questions,
specifically those where we would like to understand all quantities for
each port during a given month. Create a new list of dictionaries named
new_data
where each dictionary contains the five keys that
are the same for each monthly entry (Port Name
,
State
, Port Code
, Border
, and
Date
) along with their values, and then adds each
Measure
-Value
pair as a
key-value pair. For example, the entry for “Pinecreek”
in “Jul 2022” should look like:
'Port Name': 'Pinecreek',
{'State': 'Minnesota',
'Port Code': 3425,
'Border': 'Canada',
'Date': 'Jul 2022',
'Trucks': 6,
'Personal Vehicles': 155,
'Personal Vehicle Passengers': 273})
Remember that the port name is not unique (see Part 3a)! How can you keep track of the data associated with each port, month pair?
new_data
should be a list of
dictionaries, but you do not need to create this until the end of the
processing..values()
may be useful, but remember it returns a
view, not a list.Write code to update the new_data
list
of dictionaries created in Part 4 to add a key-value pair named
Total People
whose value is the sum of the values in that
port-month that count people. The fields that count
people are:
'Bus Passengers', 'Pedestrians', 'Personal Vehicle Passengers', 'Train Passengers'} {
Note that each port may not have one (or any) of these fields! For example, for the port “San Ysidro” in “Jul 2022”, we have 24_371 bus passengers, 2_206_115 personal vehicle passengers, and 627_673 pedestrians so our updated data should look like:
'Port Name': 'San Ysidro',
{'State': 'California',
'Port Code': 2504,
'Border': 'Mexico',
'Date': 'Jul 2022',
'Bus Passengers': 24371,
'Personal Vehicles': 1328690,
'Personal Vehicle Passengers': 2206115,
'Buses': 969,
'Pedestrians': 627673,
'Total People': 2858159}
Make sure that if a port has zero people, there is still a total entry with a value of zero.
Write a function count_by_mode
that, given a string
corresponding to a mode, returns a count of the number of ports that
have handled traffic corresponding to the specified mode of
transportation. This means that a port should have an entry for the
specified mode and its value should be greater
than zero. For example, if we run
count_by_mode('Trains')
, the method should return
35
and count_by_mode('Personal Vehicles')
should return 114
.
{<Date>: {<State 1>: <Total 1>, <State 2>: <Total 2>, ...}
.
For more points, add the total crossing into each country (i.e. add new
entries for Canada and Mexico)