Assignment 3

Goals

The goal of this assignment is to work with lists and dictionaries in Python.

Instructions

You will be doing your work in a Jupyter notebook for this assignment. You may choose to work on this assignment on a hosted environment (e.g. tiger) or on your own local installation of Jupyter and Python. You should use Python 3.10 or higher for your work, but versions 3.9+ should work for this assignment. To use tiger, use the credentials you received. If you work remotely, make sure to download the .ipynb file to turn in. If you choose to work locally, Anaconda is the easiest way to install and manage Python. If you work locally, you may launch Jupyter Lab either from the Navigator application or via the command-line as jupyter-lab.

In this assignment, we will be working with data from the United States Department of Transportation’s Border Crossing Entry Data. This dataset counts all traffic coming through the ports at the Canadian and Mexican borders. Rather than using this dataset directly, I have created a subset of this data, which can be read as a list of dictionaries. That data is located here, but you do not need to download it. I have created a template notebook, a3.ipynb, that contains a cell that will download and read that data. You can right-click and save-as the a3.ipynb file, and, if working on tiger, upload that file. Once loaded, the data is a list of dictionaries where each dictionary has nine key-value pairs. Those keys and a brief description are:

Port Name: a name for the port, often associated with its location
State: the state the port is located in
Port Code: a unique numeric identifier for the port
Border: the country whose border the port is for (Canada or Mexico)
Date: the month and year the data when the data was collected, as a string
Measure: the type of conveyance/container/person being counted
Value: the value of the measure for the specified month

You will be answering queries and writing functions to help analyze this data. You may not use external libraries including statistics, collections, datetime, or pandas for this assignment.

Due Date

The assignment is due at 11:59pm on ~~Wednesday, September 28~~ Friday, September 30.

Submission

You should submit the completed notebook file required for this assignment on Blackboard. The filename of the notebook should be a3.ipynb.

Details

Please make sure to follow instructions to receive full credit. Use a markdown cell to Label each part of the assignment with the number of the section you are completing. You may put the code for each part into one or more cells. Students in CSCI 490 need to complete parts 1-5, and students in CSCI 503 need to complete all parts.

0. Name & Z-ID (5 pts)

The first cell of your notebook should be a markdown cell with a line for your name and a line for your Z-ID. If you wish to add other information (the assignment name, a description of the assignment), you may do so after these two lines.

1. Measure Types (5 pts)

Find all of the possible values for the measure in the dataset. List each type only once!

Hints

Iterate through all of the list elements, and extract the Measure from each element (which is a dictionary)
Consider using a set

2. Largest Pedestrian Crossing (10 pts)

Write code to find the port in the dataset with the largest monthly count for pedestrians. Output the Port Name and State of the port. Remember that you will need to iterate through each element of the list, and each element is a dictionary which has various keys including Measure and Value.

3. Ports Per State

a. Duplicate Port Name (10 pts)

There are two ports with the same Port Name. What is the name of the port, and what are the two states and port codes that share this name?

Hints

Check which attribute (or combination of attributes) is unique.
How can you find a name that is tied to more than one of these unique attributes?

b. Number of Ports Per State (15 pts)

Write code to find the number of ports per state. Note that this is not the same as counting the number of entries in the dataset for each state. Consider using a two-step process: first, find all of the unique ports with their states, and then use that result to count the number of ports in each state.

Hints

For the first part, see Part 1, but now we need to do this for each state.
The count_letters example from class may be useful for the second part

4. Reformat the Data (15 pts)

The format of the data is not ideal for some types of questions, specifically those where we would like to understand all quantities for each port during a given month. Create a new list of dictionaries named new_data where each dictionary contains the five keys that are the same for each monthly entry (Port Name, State, Port Code, Border, and Date) along with their values, and then adds each Measure-Value pair as a key-value pair. For example, the entry for “Pinecreek” in “Jul 2022” should look like:

 {'Port Name': 'Pinecreek',
  'State': 'Minnesota',
  'Port Code': 3425,
  'Border': 'Canada',
  'Date': 'Jul 2022',
  'Trucks': 6,
  'Personal Vehicles': 155,
  'Personal Vehicle Passengers': 273})

Remember that the port name is not unique (see Part 3a)! How can you keep track of the data associated with each port, month pair?

Hints

You only want one dictionary per port-month. How can you keep these so you can update them as necessary?
The merge operator may be useful
Remember your new_data should be a list of dictionaries, but you do not need to create this until the end of the processing.
.values() may be useful, but remember it returns a view, not a list.

5. Compute and Add Total Number of Persons (15 pts)

Write code to update the new_data list of dictionaries created in Part 4 to add a key-value pair named Total People whose value is the sum of the values in that port-month that count people. The fields that count people are:

{'Bus Passengers', 'Pedestrians', 'Personal Vehicle Passengers', 'Train Passengers'}

Note that each port may not have one (or any) of these fields! For example, for the port “San Ysidro” in “Jul 2022”, we have 24_371 bus passengers, 2_206_115 personal vehicle passengers, and 627_673 pedestrians so our updated data should look like:

{'Port Name': 'San Ysidro',
 'State': 'California',
 'Port Code': 2504,
 'Border': 'Mexico',
 'Date': 'Jul 2022',
 'Bus Passengers': 24371,
 'Personal Vehicles': 1328690,
 'Personal Vehicle Passengers': 2206115,
 'Buses': 969,
 'Pedestrians': 627673,
 'Total People': 2858159}

Make sure that if a port has zero people, there is still a total entry with a value of zero.

6. [CS503 Only] Filter by Transportation Mode (15 pts)

Write a function count_by_mode that, given a string corresponding to a mode, returns a count of the number of ports that have handled traffic corresponding to the specified mode of transportation. This means that a port should have an entry for the specified mode and its value should be greater than zero. For example, if we run count_by_mode('Trains'), the method should return 35 and count_by_mode('Personal Vehicles') should return 114.

Hints

Similar to Part 3, we only want to count the transportation mode once per port. For example, if Point Roberts had a non-zero number of buses in 61 of the months, it still only counts once in our final total. Think about which data type would work well for this.
Remember to use a value for keeping track of the port
Think about using a nested data structure.

Extra Credit

CSCI 490 Students may complete Part 6 for extra credit.
Write code to compute the number of people crossing the border in each state in each month. Produce a dictionary of the form {<Date>: {<State 1>: <Total 1>, <State 2>: <Total 2>, ...}. For more points, add the total crossing into each country (i.e. add new entries for Canada and Mexico)