The goal of this assignment is to work with scripts and packages in Python.
You will be doing your work in Python for this assignment. You may
choose to work on this assignment on a hosted environment (e.g. tiger) or on your own local
installation of Jupyter and Python. You should use Python 3.12 for your
work, but versions 3.9+ should work for this assignment. To use tiger,
use the credentials you received. If you work remotely, make sure to
download the .py files to turn in. If you choose to work locally, Anaconda is the easiest way
to install and manage Python. If you work locally, you may launch
Jupyter Lab either from the Navigator application or via the
command-line as jupyter-lab
.
In this assignment, we will again be working with data from the Senate Stock Watcher, built by Timothy Carambat, that we first used for Assignment 3. That data is located here. You may use the following code to download this data:
from pathlib import Path
import json
from urllib.request import urlretrieve
# download the data if we don't have it locally
= "https://faculty.cs.niu.edu/~dakoop/cs503-2024sp/a3/senate-stock-trades.json"
url = "senate-stock-trades.json"
local_fname if not Path(local_fname).exists():
urlretrieve(url, local_fname)
Once loaded, the data is a list of dictionaries where each dictionary has ten key-value pairs. Those keys and a brief description are:
transaction_date
: the date of the transaction as a
string in mm/dd/yyyy formatowner
: the owner of the asset (the senator or a family
member)ticker
: the stock ticker symbol (e.g. AAPL)asset_type
: whether the asset is a stock, bond,
cryptocurrency, etc.type
: the type of transaction (purchase or sale)amount_range
: the amount of the transaction (a range
specified by a tuple (min_amount, max_amount)).senator
: the name of the senator involved in the
transactionYou will be writing Python modules, putting them in a package, and writing a script to help analyze this data. While not required, you may find it useful to create a notebook where you can test the modules and programs. You may use other standard Python modules (e.g. collections) in this assignment.
The assignment is due at 11:59pm on Monday, March 25.
You should submit the completed Python files required for this
assignment on Blackboard. Zip
the files together; the filename of the zipfile should be
a5.zip
. You can create an archive on tiger (assuming you
created an a5 directory above the package and script that is your
current working directory) using the following code in a notebook:
import shutil
'../a5', 'zip', '..', 'a5') shutil.make_archive(
Then, download the a5.zip file to turn in via Blackboard. Make sure
your archive contains both the
senate_stock_trades
package and the
analyze_trades.py
program.
Please make sure to follow instructions to receive full credit. To
test your code, you may use the %run
magic command in the
notebook. For example,
%run analyze_trades.py
You may also use the Terminal in Jupyter on tiger, but you should activate the correct environment first:
$ /opt/miniforge3/bin/conda init
$ conda activate py3.12
$ python analyze_trades.py
or run python from the correct environment:
$ /opt/miniforge3/envs/py3.12/bin/python analyze_trades.py
Since we are using Python files (.py) files for this assignment, add
the identifying information to the beginning of your
analyze_trades.py
program and the __init__.py
file of your package. Minimally, you should have a line for your name
and a line for your Z-ID. If you wish to add other information (the
assignment name, a description of the assignment), you may do so after
these two lines.
Create three new Python modules, one for reading the dataset, one for
analyzing trades by ticker symbol, and one for comparing two senators.
Put the three modules (util.py
, ticker.py
, and
compare.py
) into a package named
senate_stock_trades
.
Create a util.py
module that has three methods:
get_data
, add_amount_ranges
, and
sub_amount_ranges
.
The get_data
method should read and parse the senate-stock-trades.json datafile
and store it in a module variable. Assume that the data file resides in
the same directory as util.py
. You can then get its
absolute path via the __file__
variable of the module
via:
import os
= os.path.join(os.path.dirname(__file__),'senate-stock-trades.json') fname
Use the json
module to load the data from the file. Your get_data
method should only read and parse the file from disk
once, otherwise returning the pre-loaded data.
The add_amount_ranges
and sub_amount_ranges
should add and subtract two amount ranges, respectively. Recall that an
amount range is a tuple (min, max). Two ranges
(a,b) +/- (c,d) = (a+/-c, b+/-d).
%autoreload
to automatically reload modules as you edit them. Do note, however, that
this will mask the effects of trying to not keep reloading the data! You
can also use importlib.reload
to do this manually.Create a ticker.py
module that has two methods that both
take one parameter, the ticker symbol. Use the get_data
method from the data module to obtain the data. The first method,
count_trades
, should return a dictionary of the form
{<senator>: <count>}
with the counts of trades
for each senator. The second method, sum_trades
, should
return a dictionary of the form
{<senator>: (<min_value>,<max_value)}
with
the range of possible trade values. Use the
util.add_amount_ranges
method from Part 1a to add the
amount ranges.
collections.Counter
for count_trades
.collections.defaultdict
to help with sum_trades
. You can use a lambda function as
the argument to defaultdict to initialize the key-value pairs with a
tuple.Create a compare.py
module that calculates comparative
information between two senators. Given two senators’ names as
parameters, the count_diff
method should return the
difference between the number of transactions between
the two senators, and the amount_diff
method should return
the ranged difference between the amounts of all trades. This
difference should be computed by using the
util.sub_amount_ranges
method from Part 1a.
Make sure all three analysis modules live in a single
senate_stock_trades
package. Add an
__init__.py
file for completeness. It may contain
documentation and the pass keyword.
Create a analyze_trades.py
program that uses the package
from Part 1 to identify trades of interest and compare senators. The
script should process two subcommands; the first is “ticker” and the
second is “compare”. The first subcommand prints the results from the
count_trades
and sum_trades
methods, and the
second subcommand takes the names of two senators as arguments and
prints the results from the count_diff
and
amount_diff
methods. You can test your script via the
IPython magic command %run analyze_trades.py ...
or via the
shell command
!/opt/conda/envs/py3.12/bin/python analyze_trades.py ...
(you will need to adjust the path if not using tiger). Make sure to
print a usage method if the user misses or provides incorrect arguments.
Some sample output:
%run analyze_trades.py
Usage: python analyze_trades.py [ticker <ticker> | compare <senator1> <senator2>]
%run analyze_trades.py ticker
Usage: python analyze_trades.py [ticker <ticker> | compare <senator1> <senator2>]
%run analyze_trades.py ticker NVDA
Number of trades:
Pat Roberts: 23
Ron Wyden: 13
Sheldon Whitehouse: 10
Tommy Tuberville: 6
Dan Sullivan: 3
Thomas R. Carper: 3
Kelly Loeffler: 2
Susan M. Collins: 1
John W. Hickenlooper: 1
Sum of trade values:
Pat Roberts: (464023, 1260000)
John W. Hickenlooper: (500001, 1000000)
Ron Wyden: (378013, 945000)
Tommy Tuberville: (48006, 195000)
Sheldon Whitehouse: (10010, 150000)
Kelly Loeffler: (30002, 100000)
Susan M. Collins: (15001, 50000)
Dan Sullivan: (3003, 45000)
Thomas R. Carper: (3003, 45000)
%run analyze_trades.py compare "Pat Roberts" "Sheldon Whitehouse"
Pat Roberts has -213 trades with value +(3030787, 3140000) than Sheldon Whitehouse
%run analyze_trades.py compare "Pat Roberts" "Patty Murray"
Pat Roberts has +256 trades with value +(5551256, 15265000) than Patty Murray
For this part, you will add date filtering to the package and program you wrote in Parts 1 and 2. This should not require significant changes to the overall logic, and your final package and library should work both for unfiltered data and for filtered data. Turn in this final package and program with the additional filtering added.
Add the ability to restrict the calculations by date to the
compare.py
module. Thus, count_diff
and
amount_diff
should take two optional
parameters that set the start date and end date, respectively. If they
are not set, the start date is the earliest date in the data, and the
end date is the latest date. The range is inclusive. The methods should
now return the differences for the senators for only the trades between
those dates, inclusive.
You may choose to parse the transaction_date
in the data
to a tuple (as in Assignment 3) or to a date object using the
datetime.date
library. You may also choose which format the
count_diff
and amount_diff
methods take
(e.g. a tuple (year, month, day) or a date object), but you
need to document this as a docstring in those methods. You will be
passing arguments that adhere to your format in Part 3b.
or
expression to check if either date
is unset or the criteria is satisfied.Add support for the date filtering in your script. To do so, we will
require that the user specify the argument after the
senators’ names and specify the date range in the format
YYYY-mm-dd:YYYY-mm-dd
. If the date range is specified, pass
the parsed individual dates to the correct parameters
of count_diff
and amount_diff
. You will need
to parse the string to split the two dates and convert them to the form
required by Part 3a. Some sample output:
%run analyze_trades.py compare
Usage: python analyze_trades.py [ticker <ticker> | compare <senator1> <senator2> [start-date:end-date]]
%run analyze_trades.py compare "Pat Roberts" "Sheldon Whitehouse" 2018-01-01:
Pat Roberts has +94 trades with value +(3455094, 8400000) than Sheldon Whitehouse
%run analyze_trades.py compare "Pat Roberts" "Sheldon Whitehouse" 2010-01-01:2017-12-31
Pat Roberts has -307 trades with value -(424307, 5260000) than Sheldon Whitehouse
operator
package to refactor the add_amount_ranges
and
sub_amount_ranges
methods to use a common shared
method.-d
argument that can be before or after the senators names.
For example,$ python analyze_trades.py compare -d 2018-01-01: "Pat Roberts" "Sheldon Whitehouse"