Assignment 3

Goals

The goal of this assignment is to work with lists and dictionaries in Python.

Instructions

You will be doing your work in a Jupyter notebook for this assignment. You may choose to work on this assignment on a hosted environment (e.g. tiger) or on your own local installation of Jupyter and Python. You should use Python 3.12 for your work. (Older versions may work, but your code will be checked with Python 3.12.) To use tiger, use the credentials you received. If you work remotely, make sure to download the .ipynb file to turn in. If you choose to work locally, Anaconda or miniforge are probably the easiest ways to install and manage Python. If you work locally, you may launch Jupyter Lab either from the Navigator application (anaconda) or via the command-line as jupyter-lab or jupyter lab.

In this assignment, we will be working with data from the United States Department of Transportation’s Border Crossing Entry Data. This dataset counts all traffic coming through the ports at the Canadian and Mexican borders. Rather than using this dataset directly, I have created a subset of this data, which can be read as a list of dictionaries. That data is located here (backup link), but you do not need to download it. I have created a template notebook, a3.ipynb, that contains a cell that will download and read that data. You can right-click and save-as the a3.ipynb file, and, if working on tiger, upload that file. Once loaded, the data is a list of dictionaries where each dictionary has nine key-value pairs. Those keys and a brief description are:

  • Port Name: a name for the port, often associated with its location
  • State: the state the port is located in
  • Port Code: a unique numeric identifier for the port
  • Border: the country whose border the port is for (Canada or Mexico)
  • Date: the month and year the data when the data was collected, as a string
  • Measure: the type of conveyance/container/person being counted
  • Value: the value of the measure for the specified month

You will be answering queries and writing functions to help analyze this data. You may not use external libraries including statistics, collections, datetime, or pandas for this assignment.

Due Date

The assignment is due at 11:59pm on Friday, Feb. 14.

Submission

You should submit the completed notebook file required for this assignment on Blackboard. The filename of the notebook should be a3.ipynb.

Details

Please make sure to follow instructions to receive full credit. Use a markdown cell to Label each part of the assignment with the number of the section you are completing. You may put the code for each part into one or more cells. Students in CSCI 490 need to complete parts 1-5, and students in CSCI 503 need to complete all parts.

0. Name & Z-ID (5 pts)

The first cell of your notebook should be a markdown cell with a line for your name and a line for your Z-ID. If you wish to add other information (the assignment name, a description of the assignment), you may do so after these two lines.

1. Measure Types (5 pts)

Find all of the possible values for the measure in the dataset. List each type only once!

Hints
  • Iterate through all of the list elements, and extract the Measure from each element (which is a dictionary)
  • Consider using a set to store the values

2. Largest Pedestrian Crossing (10 pts)

Write code to find the port in the dataset with the largest monthly count for pedestrians. Output the Port Name and State of the port. Remember that you will need to iterate through each element of the list, and each element is a dictionary which has various keys including Measure and Value.

3. Ports Per State

a. Duplicate Port Name (10 pts)

There are two ports with the same Port Name. What is the name of the port, and what are the two states and port codes that share this name?

Hints
  • Check which attribute (or combination of attributes) is unique.
  • How can you find a name that is tied to more than one of these unique attributes?

b. Number of Ports Per State (15 pts)

Write code to find the number of ports per state. Note that this is not the same as counting the number of entries in the dataset for each state. Consider using a two-step process: first, find all of the unique ports with their states, and then use that result to count the number of ports in each state.

Hints
  • For the first part, see Part 1, but now we need to do this for each state.
  • The count_letters example from class may be useful for the second part

4. Reformat the Data (15 pts)

The format of the data is not ideal for some types of questions, specifically those where we would like to understand all quantities for each port during a given month. Create a new list of dictionaries named new_data where each dictionary contains the five keys that are the same for each monthly entry (Port Name, State, Port Code, Border, and Date) along with their values, and then adds each Measure-Value pair as a key-value pair. For example, the entry for “Porthill” in “Apr 2024” should look like:

{'Port Name': 'Porthill',
 'State': 'Idaho',
 'Port Code': 3308,
 'Border': 'Canada',
 'Date': 'Apr 2024',
 'Trucks': 98,
 'Truck Containers Loaded': 46,
 'Truck Containers Empty': 75,
 'Pedestrians': 46,
 'Personal Vehicle Passengers': 11852,
 'Personal Vehicles': 7338}

Remember that the port name is not unique (see Part 3a)! How can you keep track of the data associated with each port, month pair?

Hints
  • You only want one dictionary per port-month. How can you keep these so you can update them as necessary?
  • The merge operator may be useful
  • Remember your new_data should be a list of dictionaries, but you do not need to create this until the end of the processing.
  • .values() may be useful, but remember it returns a view, not a list.

5. Compute and Add Total Number of Persons (15 pts)

Write code to update the new_data list of dictionaries created in Part 4 to add a key-value pair named Total People whose value is the sum of the values in that port-month that count people. The fields that count people are:

{'Bus Passengers', 'Pedestrians', 'Personal Vehicle Passengers', 'Train Passengers'}

Note that each port may not have one (or any) of these fields! For example, for the port “Laredo” in “Jul 2023”, our updated data should look like:

{'Port Name': 'Laredo',
 'State': 'Texas',
 'Port Code': 2304,
 'Border': 'Mexico',
 'Date': 'Jul 2023',
 'Trains': 350,
 'Buses': 3067,
 'Bus Passengers': 97877,
 'Pedestrians': 230890,
 'Personal Vehicles': 407185,
 'Rail Containers Empty': 17432,
 'Rail Containers Loaded': 27557,
 'Trucks': 239167,
 'Truck Containers Loaded': 193929,
 'Personal Vehicle Passengers': 816540,
 'Truck Containers Empty': 59600,
 'Total People': 1145307}

Make sure that if a port has zero people, there is still a total entry with a value of zero.

6. [CS503 Only] Filter by Transportation Mode (15 pts)

Write a function count_by_mode that, given a string corresponding to a mode, returns a count of the number of ports that have handled traffic corresponding to the specified mode of transportation. This means that a port should have an entry for the specified mode and its value should be greater than zero. For example, if we run count_by_mode('Trains'), the method should return 33 and count_by_mode('Personal Vehicles') should return 110.

Hints
  • Similar to Part 3, we only want to count the transportation mode once per port. For example, if one port had a non-zero number of buses in 48 of the months, it still only counts once in our final total. Think about which data type would work well for this.
  • Remember to use a value for keeping track of the port
  • Think about using a nested data structure.

Extra Credit

  • CSCI 490 Students may complete Part 6 for extra credit.
  • Write code to compute the number of people crossing the border in each state in each month. Produce a dictionary of the form {<Date>: {<State 1>: <Total 1>, <State 2>: <Total 2>, ...}. For more points, add the total crossing into each country (i.e. add new entries for Canada and Mexico)