The goal of this assignment is to work with lists and dictionaries in Python.
You will be doing your work in a Jupyter notebook for this
assignment. You may choose to work on this assignment on a hosted
environment (e.g. tiger)
or on your own local installation of Jupyter and Python. You should use
Python 3.12 for your work. (Older versions may work, but your code will
be checked with Python 3.12.) To use tiger, use the credentials you
received. If you work remotely, make sure to download the .ipynb file to
turn in. If you choose to work locally, Anaconda or miniforge are
probably the easiest ways to install and manage Python. If you work
locally, you may launch Jupyter Lab either from the Navigator
application (anaconda) or via the command-line as
jupyter-lab
or jupyter lab
.
In this assignment, we will be working with data from the United States Department of Transportation’s Border Crossing Entry Data. This dataset counts all traffic coming through the ports at the Canadian and Mexican borders. Rather than using this dataset directly, I have created a subset of this data, which can be read as a list of dictionaries. That data is located here (backup link), but you do not need to download it. I have created a template notebook, a3.ipynb, that contains a cell that will download and read that data. You can right-click and save-as the a3.ipynb file, and, if working on tiger, upload that file. Once loaded, the data is a list of dictionaries where each dictionary has nine key-value pairs. Those keys and a brief description are:
Port Name
: a name for the port, often associated with
its locationState
: the state the port is located inPort Code
: a unique numeric identifier for the
portBorder
: the country whose border the port is for
(Canada or Mexico)Date
: the month and year the data when the data was
collected, as a stringMeasure
: the type of conveyance/container/person being
countedValue
: the value of the measure for the specified
monthYou will be answering queries and writing functions to help analyze this data. You may not use external libraries including statistics, collections, datetime, or pandas for this assignment.
The assignment is due at 11:59pm on Friday, Feb. 14.
You should submit the completed notebook file required for this
assignment on Blackboard. The
filename of the notebook should be a3.ipynb
.
Please make sure to follow instructions to receive full credit. Use a markdown cell to Label each part of the assignment with the number of the section you are completing. You may put the code for each part into one or more cells. Students in CSCI 490 need to complete parts 1-5, and students in CSCI 503 need to complete all parts.
The first cell of your notebook should be a markdown cell with a line for your name and a line for your Z-ID. If you wish to add other information (the assignment name, a description of the assignment), you may do so after these two lines.
Find all of the possible values for the measure in the dataset. List each type only once!
Measure
from each element (which is a dictionary)Write code to find the port in the dataset with the largest monthly
count for pedestrians. Output the Port
Name and State of the port. Remember that you
will need to iterate through each element of the list, and each element
is a dictionary which has various keys including Measure
and Value
.
There are two ports with the same Port Name
. What is the
name of the port, and what are the two states and port codes that share
this name?
Write code to find the number of ports per state. Note that this is not the same as counting the number of entries in the dataset for each state. Consider using a two-step process: first, find all of the unique ports with their states, and then use that result to count the number of ports in each state.
count_letters
example from class may be useful for
the second partThe format of the data is not ideal for some types of questions,
specifically those where we would like to understand all quantities for
each port during a given month. Create a new list of dictionaries named
new_data
where each dictionary contains the five keys that
are the same for each monthly entry (Port Name
,
State
, Port Code
, Border
, and
Date
) along with their values, and then adds each
Measure
-Value
pair as a
key-value pair. For example, the entry for “Porthill”
in “Apr 2024” should look like:
'Port Name': 'Porthill',
{'State': 'Idaho',
'Port Code': 3308,
'Border': 'Canada',
'Date': 'Apr 2024',
'Trucks': 98,
'Truck Containers Loaded': 46,
'Truck Containers Empty': 75,
'Pedestrians': 46,
'Personal Vehicle Passengers': 11852,
'Personal Vehicles': 7338}
Remember that the port name is not unique (see Part 3a)! How can you keep track of the data associated with each port, month pair?
new_data
should be a list of
dictionaries, but you do not need to create this until the end of the
processing..values()
may be useful, but remember it returns a
view, not a list.Write code to update the new_data
list
of dictionaries created in Part 4 to add a key-value pair named
Total People
whose value is the sum of the values in that
port-month that count people. The fields that count
people are:
'Bus Passengers', 'Pedestrians', 'Personal Vehicle Passengers', 'Train Passengers'} {
Note that each port may not have one (or any) of these fields! For example, for the port “Laredo” in “Jul 2023”, our updated data should look like:
'Port Name': 'Laredo',
{'State': 'Texas',
'Port Code': 2304,
'Border': 'Mexico',
'Date': 'Jul 2023',
'Trains': 350,
'Buses': 3067,
'Bus Passengers': 97877,
'Pedestrians': 230890,
'Personal Vehicles': 407185,
'Rail Containers Empty': 17432,
'Rail Containers Loaded': 27557,
'Trucks': 239167,
'Truck Containers Loaded': 193929,
'Personal Vehicle Passengers': 816540,
'Truck Containers Empty': 59600,
'Total People': 1145307}
Make sure that if a port has zero people, there is still a total entry with a value of zero.
Write a function count_by_mode
that, given a string
corresponding to a mode, returns a count of the number of ports that
have handled traffic corresponding to the specified mode of
transportation. This means that a port should have an entry for the
specified mode and its value should be greater
than zero. For example, if we run
count_by_mode('Trains')
, the method should return
33
and count_by_mode('Personal Vehicles')
should return 110
.
{<Date>: {<State 1>: <Total 1>, <State 2>: <Total 2>, ...}
.
For more points, add the total crossing into each country (i.e. add new
entries for Canada and Mexico)