The goal of this assignment is to work with scripts and packages in Python.
You will be doing your work in a Jupyter notebook for this
assignment. You may choose to work on this assignment on a hosted
environment (e.g. tiger)
or on your own local installation of Jupyter and Python. You should use
Python 3.14 for your work. To use tiger, use the credentials you
received. If you work remotely, make sure to download the .ipynb file to
turn in. If you choose to work locally, miniforge, Anaconda, pixi or uv are probably the easiest ways
to install and manage Python. If you work locally, you may launch
Jupyter Lab either from the Navigator application (anaconda) or via the
command-line as jupyter-lab or
jupyter lab.
In this assignment, we will again be working with data about US Senators’ stock trading practices using publicly available data collated by InsiderFinance, first used for Assignment 3. That data is located here, but remember it is compressed. You may use the following code to download this data:
from pathlib import Path
import json
from urllib.request import urlretrieve
import gzip
# download the data if we don't have it locally
url = "https://faculty.cs.niu.edu/~dakoop/cs503-2026sp/a3/senate-stock-trades.json.gz"
local_fname = "senate-stock-trades.json.gz"
if not Path(local_fname).exists():
urlretrieve(url, local_fname)Once loaded, the data is a list of dictionaries where each dictionary has ten key-value pairs. Those keys and a brief description are:
office: the name of the senator reporting the
transactionowner: the owner of the asset (the senator or a family
member)transaction_date: the date of the transaction as a
string in mm/dd/yyyy formattype: the type of transaction (purchase or sale)asset_type: whether the asset is a stock, bond,
cryptocurrency, etc.symbol: the stock ticker symbol if it exists
(e.g. AAPL)amount_range: the amount of the transaction (a range
specified by a list (min_amount, max_amount)).party: the political party of the senator reporting the
transactionlink: a link to the reportYou will be writing Python modules, putting them in a package, and writing a script to help analyze this data. While not required, you may find it useful to create a notebook where you can test the modules and programs. You may use other standard Python modules (e.g. collections) in this assignment.
The assignment is due at 11:59pm on Monday, March 23.
You should submit the completed Python files required for this
assignment on Blackboard. Zip
the files together; the filename of the zipfile should be
a5.zip. You can create an archive on tiger (assuming at a
directory above the package) using the following code:
import shutil
shutil.make_archive('a5', 'zip', '.', 'senate_stock_trades')Then, download the a5.zip to turn in via Blackboard. Make sure your
archive contains both the
senate_stock_trades package and the
analyze_trades.py program.
Please make sure to follow instructions to receive full credit. To
test your code, you may use the %run magic command in the
notebook. For example,
%run analyze_trades.py
You may also use the Terminal in Jupyter on tiger, but you should activate the correct environment first:
$ /opt/miniforge3/bin/conda init
$ conda activate py3.14
$ python analyze_trades.py
or run python from the correct environment:
$ /opt/miniforge3/envs/py3.14/bin/python analyze_trades.py
Since we are using Python files (.py) files for this assignment, add
the identifying information as comments to the beginning of your
analyze_trades.py program and the
__init__.py file of your package. Minimally, you should
have a line for your name and a line for your Z-ID. If you wish to add
other information (the assignment name, a description of the
assignment), you may do so after these two lines.
Create three new Python modules, one for reading the dataset, one for
analyzing trades by ticker symbol, and one for comparing two senators.
Put the three modules (util.py, ticker.py, and
compare.py) into a package named
senate_stock_trades.
Create a util.py module that has three methods:
get_data, add_amount_ranges, and
sub_amount_ranges.
The get_data method should read and parse the senate-stock-trades.json.gz
datafile and store it in a module variable. Assume that the data file
resides in the same directory as util.py. You can then get
its absolute path via the __file__ variable of the module
via:
from pathlib import Path
local_path = Path(__file__).parent / "senate-stock-trades.json.gz"Use the json and gzip modules to load the data from the file. Remember we did something similar in Assignment 3:
data = json.load(gzip.open(local_path))but this needs to be integrated into the get_data
method. However, your get_data method should only read and
parse the file from disk once, otherwise returning the
pre-loaded data.
The add_amount_ranges and sub_amount_ranges
should add and subtract two amount ranges, respectively. Recall that an
amount range is a tuple (min, max). Two ranges
(a,b) +/- (c,d) = (a+/-c, b+/-d).
%autoreload
to automatically reload modules as you edit them. Do note, however, that
this will mask the effects of trying to not keep reloading the data! You
can also use importlib.reload to do this manually.Create a ticker.py module that has two methods that both
take one parameter, the ticker symbol. Use the get_data
method from the data module to obtain the data. The first method,
count_trades, should return a dictionary of the form
{<office>: <count>} with the counts of trades
for each senator. The second method, sum_trades, should
return a dictionary of the form
{<office>: (<min_value>,<max_value)} with
the range of possible trade values. Use the
util.add_amount_ranges method from Part 1a to add the
amount ranges.
collections.Counter
for count_trades.collections.defaultdict
to help with sum_trades. You can use a lambda function as
the argument to defaultdict to initialize the key-value pairs with a
tuple.Create a compare.py module that calculates comparative
information between two senators. Given two offices’ names as
parameters, the count_diff method should return the
difference between the number of transactions between
the two senators, and the amount_diff method should return
the ranged difference between the amounts of all trades. This
difference should be computed by using the
util.sub_amount_ranges method from Part 1a.
Make sure all three analysis modules live in a single
senate_stock_trades package. Add an
__init__.py file for completeness. It may contain
documentation and the pass keyword.
Create a analyze_trades.py program that uses the package
from Part 1 to identify trades of interest and compare senators’ trades.
The script should process two subcommands; the first is
“ticker” and the second is “compare”. The first subcommand prints the
results from the count_trades and sum_trades
methods, and the second subcommand takes the names of two offices as
arguments and prints the results from the count_diff and
amount_diff methods. You can test your script via the
IPython magic command %run analyze_trades.py ... or via the
shell command
!/opt/conda/envs/py3.14/bin/python analyze_trades.py ...
(you will need to adjust the path if not using tiger). Make sure to
print a usage method if the user misses or provides incorrect arguments.
Some sample output:
%run analyze_trades.py
Usage: python analyze_trades.py [ticker <ticker> | compare <senator1> <senator2>]
%run analyze_trades.py ticker
Usage: python analyze_trades.py [ticker <ticker> | compare <senator1> <senator2>]
%run analyze_trades.py ticker NVDA
Number of trades:
Sheldon Whitehouse: 11
John Boozman: 8
Katie Britt: 6
Markwayne Mullin: 6
Ashley Moody: 5
Tommy Tuberville: 4
Angus King: 2
Thomas R. Carper: 2
Sum of trade values:
Markwayne Mullin: (246006, 665000)
Sheldon Whitehouse: (137011, 480000)
Ashley Moody: (118005, 345000)
John Boozman: (8008, 120000)
Tommy Tuberville: (18004, 95000)
Katie Britt: (6006, 90000)
Angus King: (2002, 30000)
Thomas R. Carper: (2002, 30000)
%run analyze_trades.py compare "Tommy Tuberville" "Sheldon Whitehouse"
Tommy Tuberville has +560 trades with value +(7800560, 24300000) than Sheldon Whitehouse
%run analyze_trades.py compare "Tina Smith" "Katie Britt"
Tina Smith has -34 trades with value +(1087966, 1995000) than Katie Britt
For this part, you will add date filtering to the package and program you wrote in Parts 1 and 2. This should not require significant changes to the overall logic, and your final package and library should work both for unfiltered data and for filtered data. Turn in this final package and program with the additional filtering added.
Add the ability to restrict the calculations by date to the
compare.py module. Thus, count_diff and
amount_diff should take two optional
parameters that set the start date and end date, respectively. If they
are not set, the start date is the earliest date in the data, and the
end date is the latest date. The range is inclusive. The methods should
now return the differences for the senators for only the trades between
those dates, inclusive.
You may choose to parse the transaction_date in the data
to a tuple (as in Assignment 3) or to a date object using the
datetime.date library. You may also choose which format the
count_diff and amount_diff methods take
(e.g. a tuple (year, month, day) or a date object), but you
need to document this as a docstring in those methods. You will be
passing arguments that adhere to your format in Part 3b.
or expression to check if either date
is unset or the criteria is satisfied.Add support for the date filtering in your script. To do so, we will
require that the user specify the argument after the
senators’ names and specify the date range in the format
YYYY-mm-dd:YYYY-mm-dd. If the date range is specified, pass
the parsed individual dates to the correct parameters
of count_diff and amount_diff. You will need
to parse the string to split the two dates and convert them to the form
required by Part 3a. Some sample output:
%run analyze_trades.py compare
Usage: python analyze_trades.py [ticker <ticker> | compare <senator1> <senator2> [start-date:end-date]]
%run analyze_trades.py compare "Tina Smith" "Katie Britt" 2025-06-01:
Tina Smith has -9 trades with value +(435991, 890000) than Katie Britt
%run analyze_trades.py compare "Tommy Tuberville" "Sheldon Whitehouse" 2025-01-01:2025-06-01
Tommy Tuberville has +21 trades with value +(392021, 1205000) than Sheldon Whitehouse
operator
package to refactor the add_amount_ranges and
sub_amount_ranges methods to use a common shared
method.-d argument that can be before or after the senators names.
For example,$ python analyze_trades.py compare -d 2025-06-01: "Tina Smith" "Katie Britt"