Assignment 5

Goals

The goal of this assignment is to work with scripts and packages in Python.

Instructions

You will be doing your work in Python for this assignment. You may choose to work on this assignment on a hosted environment (e.g. tiger) or on your own local installation of Jupyter and Python. You should use Python 3.12 for your work, but versions 3.9+ should work for this assignment. To use tiger, use the credentials you received. If you work remotely, make sure to download the .py files to turn in. If you choose to work locally, Anaconda is the easiest way to install and manage Python. If you work locally, you may launch Jupyter Lab either from the Navigator application or via the command-line as jupyter-lab.

In this assignment, we will again be working with data from the Senate Stock Watcher, built by Timothy Carambat, that we first used for Assignment 3. That data is located here. You may use the following code to download this data:

from pathlib import Path
import json
from urllib.request import urlretrieve

# download the data if we don't have it locally
url = "https://faculty.cs.niu.edu/~dakoop/cs503-2024sp/a3/senate-stock-trades.json"
local_fname = "senate-stock-trades.json"
if not Path(local_fname).exists():
    urlretrieve(url, local_fname)

Once loaded, the data is a list of dictionaries where each dictionary has ten key-value pairs. Those keys and a brief description are:

transaction_date: the date of the transaction as a string in mm/dd/yyyy format
owner: the owner of the asset (the senator or a family member)
ticker: the stock ticker symbol (e.g. AAPL)
asset_type: whether the asset is a stock, bond, cryptocurrency, etc.
type: the type of transaction (purchase or sale)
amount_range: the amount of the transaction (a range specified by a tuple (min_amount, max_amount)).
senator: the name of the senator involved in the transaction

You will be writing Python modules, putting them in a package, and writing a script to help analyze this data. While not required, you may find it useful to create a notebook where you can test the modules and programs. You may use other standard Python modules (e.g. collections) in this assignment.

Due Date

The assignment is due at 11:59pm on Monday, March 25.

Submission

You should submit the completed Python files required for this assignment on Blackboard. Zip the files together; the filename of the zipfile should be a5.zip. You can create an archive on tiger (assuming you created an a5 directory above the package and script that is your current working directory) using the following code in a notebook:

import shutil
shutil.make_archive('../a5', 'zip', '..', 'a5')

Then, download the a5.zip file to turn in via Blackboard. Make sure your archive contains both the senate_stock_trades package and the analyze_trades.py program.

Details

Please make sure to follow instructions to receive full credit. To test your code, you may use the %run magic command in the notebook. For example,

%run analyze_trades.py

You may also use the Terminal in Jupyter on tiger, but you should activate the correct environment first:

$ /opt/miniforge3/bin/conda init
$ conda activate py3.12
$ python analyze_trades.py

or run python from the correct environment:

$ /opt/miniforge3/envs/py3.12/bin/python analyze_trades.py

0. Name & Z-ID (5 pts)

Since we are using Python files (.py) files for this assignment, add the identifying information to the beginning of your analyze_trades.py program and the __init__.py file of your package. Minimally, you should have a line for your name and a line for your Z-ID. If you wish to add other information (the assignment name, a description of the assignment), you may do so after these two lines.

1. Senate Stock Trades Package (50 pts)

Create three new Python modules, one for reading the dataset, one for analyzing trades by ticker symbol, and one for comparing two senators. Put the three modules (util.py, ticker.py, and compare.py) into a package named senate_stock_trades.

1a. Data Utilities (15 pts)

Create a util.py module that has three methods: get_data, add_amount_ranges, and sub_amount_ranges.

The get_data method should read and parse the senate-stock-trades.json datafile and store it in a module variable. Assume that the data file resides in the same directory as util.py. You can then get its absolute path via the __file__ variable of the module via:

import os
fname = os.path.join(os.path.dirname(__file__),'senate-stock-trades.json')

Use the json module to load the data from the file. Your get_data method should only read and parse the file from disk once, otherwise returning the pre-loaded data.

The add_amount_ranges and sub_amount_ranges should add and subtract two amount ranges, respectively. Recall that an amount range is a tuple (min, max). Two ranges (a,b) +/- (c,d) = (a+/-c, b+/-d).

Hints

Initialize the module variable to a sentinel value to indicate when the data has not been read.
You can use %autoreload to automatically reload modules as you edit them. Do note, however, that this will mask the effects of trying to not keep reloading the data! You can also use importlib.reload to do this manually.

1b. Ticker Analysis (15 pts)

Create a ticker.py module that has two methods that both take one parameter, the ticker symbol. Use the get_data method from the data module to obtain the data. The first method, count_trades, should return a dictionary of the form {<senator>: <count>} with the counts of trades for each senator. The second method, sum_trades, should return a dictionary of the form {<senator>: (<min_value>,<max_value)} with the range of possible trade values. Use the util.add_amount_ranges method from Part 1a to add the amount ranges.

Hints

Make sure to import the util module! You might consider using relative imports to do this from a sibling module.
You may use collections.Counter for count_trades.
You could consider using collections.defaultdict to help with sum_trades. You can use a lambda function as the argument to defaultdict to initialize the key-value pairs with a tuple.

1c. Comparison (15 pts)

Create a compare.py module that calculates comparative information between two senators. Given two senators’ names as parameters, the count_diff method should return the difference between the number of transactions between the two senators, and the amount_diff method should return the ranged difference between the amounts of all trades. This difference should be computed by using the util.sub_amount_ranges method from Part 1a.

Hints

Consider testing the functions via code in a notebook. You may also do this in the modules themselves, but remember to make sure they only run when the module is run as a script.

1d. Package (5 pts)

Make sure all three analysis modules live in a single senate_stock_trades package. Add an __init__.py file for completeness. It may contain documentation and the pass keyword.

2. Stock Analysis Program (25 pts)

Create a analyze_trades.py program that uses the package from Part 1 to identify trades of interest and compare senators. The script should process two subcommands; the first is “ticker” and the second is “compare”. The first subcommand prints the results from the count_trades and sum_trades methods, and the second subcommand takes the names of two senators as arguments and prints the results from the count_diff and amount_diff methods. You can test your script via the IPython magic command %run analyze_trades.py ... or via the shell command !/opt/conda/envs/py3.12/bin/python analyze_trades.py ... (you will need to adjust the path if not using tiger). Make sure to print a usage method if the user misses or provides incorrect arguments. Some sample output:

%run analyze_trades.py
Usage: python analyze_trades.py [ticker <ticker> | compare <senator1> <senator2>]

%run analyze_trades.py ticker
Usage: python analyze_trades.py [ticker <ticker> | compare <senator1> <senator2>]

%run analyze_trades.py ticker NVDA
Number of trades:
  Pat Roberts: 23
  Ron Wyden: 13
  Sheldon Whitehouse: 10
  Tommy Tuberville: 6
  Dan Sullivan: 3
  Thomas R. Carper: 3
  Kelly Loeffler: 2
  Susan M. Collins: 1
  John W. Hickenlooper: 1
Sum of trade values:
  Pat Roberts: (464023, 1260000)
  John W. Hickenlooper: (500001, 1000000)
  Ron Wyden: (378013, 945000)
  Tommy Tuberville: (48006, 195000)
  Sheldon Whitehouse: (10010, 150000)
  Kelly Loeffler: (30002, 100000)
  Susan M. Collins: (15001, 50000)
  Dan Sullivan: (3003, 45000)
  Thomas R. Carper: (3003, 45000)

%run analyze_trades.py compare "Pat Roberts" "Sheldon Whitehouse"
Pat Roberts has -213 trades with value +(3030787, 3140000) than Sheldon Whitehouse

%run analyze_trades.py compare "Pat Roberts" "Patty Murray"
Pat Roberts has +256 trades with value +(5551256, 15265000) than Patty Murray

Hints

Create a usage method that can be called whenever there is trouble
You will need to pass the names in quotes because of the spaces which the shell will otherwise decompose into separate arguments.
You will need a different number of arguments depending on which subcommand is called
Note that showing the +/- sign for the range will require figuring out whether the amounts are less than or greater than zero

3. [CSCI 503 Only] Add Date Filtering (20 pts)

For this part, you will add date filtering to the package and program you wrote in Parts 1 and 2. This should not require significant changes to the overall logic, and your final package and library should work both for unfiltered data and for filtered data. Turn in this final package and program with the additional filtering added.

3a. Add Filtering to the Package (10 pts)

Add the ability to restrict the calculations by date to the compare.py module. Thus, count_diff and amount_diff should take two optional parameters that set the start date and end date, respectively. If they are not set, the start date is the earliest date in the data, and the end date is the latest date. The range is inclusive. The methods should now return the differences for the senators for only the trades between those dates, inclusive.

You may choose to parse the transaction_date in the data to a tuple (as in Assignment 3) or to a date object using the datetime.date library. You may also choose which format the count_diff and amount_diff methods take (e.g. a tuple (year, month, day) or a date object), but you need to document this as a docstring in those methods. You will be passing arguments that adhere to your format in Part 3b.

Hints

Use a sentinel value to indicate when the start or end date are not set. You can use an or expression to check if either date is unset or the criteria is satisfied.

3b. Add Filtering to the Program (10 pts)

Add support for the date filtering in your script. To do so, we will require that the user specify the argument after the senators’ names and specify the date range in the format YYYY-mm-dd:YYYY-mm-dd. If the date range is specified, pass the parsed individual dates to the correct parameters of count_diff and amount_diff. You will need to parse the string to split the two dates and convert them to the form required by Part 3a. Some sample output:

%run analyze_trades.py compare
Usage: python analyze_trades.py [ticker <ticker> | compare <senator1> <senator2> [start-date:end-date]]

%run analyze_trades.py compare "Pat Roberts" "Sheldon Whitehouse" 2018-01-01:
Pat Roberts has +94 trades with value +(3455094, 8400000) than Sheldon Whitehouse

%run analyze_trades.py compare "Pat Roberts" "Sheldon Whitehouse" 2010-01-01:2017-12-31
Pat Roberts has -307 trades with value -(424307, 5260000) than Sheldon Whitehouse

Hints:

Use the split function multiple times to parse the date string
Make sure to parse the extracted strings to integers
Make sure your code still runs if the entire date range is not specified
Make sure to detect when either the start or end date is not specified
Remember to update the usage string

Extra Credit

[20 pts] CSCI 490 Students may complete Part 3 for extra credit
[5 pts] Use the operator package to refactor the add_amount_ranges and sub_amount_ranges methods to use a common shared method.
[5 pts] Support downloading the senate-stock-trades.json file from the course web site instead of bundling it with the python package.
[5 pts] Add support for matching senators’ names even if they are only partially specified (e.g. just a last name). Make sure that ambiguous cases generate an error.
[5 pts] Extend Part 3b to allow the date range to be specified as a -d argument that can be before or after the senators names. For example,

$ python analyze_trades.py compare -d 2018-01-01: "Pat Roberts" "Sheldon Whitehouse"

[5 pts] Add support for date ranges to specify only the year or year and month. In these cases, the program should assume that the start date begins on the first day of the month/year, and the end date ends on the last day of the month/year.