The goal of this assignment is to work with lists and dictionaries in Python.
You will be doing your work in a Jupyter notebook for this
assignment. You may choose to work on this assignment on a hosted
environment (e.g. tiger)
or on your own local installation of Jupyter and Python. You should use
Python 3.12 for your work. (Older versions may work, but your code will
be checked with Python 3.12.) To use tiger, use the credentials you
received. If you work remotely, make sure to download the .ipynb file to
turn in. If you choose to work locally, Anaconda or miniforge are
probably the easiest ways to install and manage Python. If you work
locally, you may launch Jupyter Lab either from the Navigator
application (anaconda) or via the command-line as
jupyter-lab
or jupyter lab
.
In this assignment, we will be working with data from the United States Department of Agriculture’s FoodData Central. Rather than using this dataset directly, I have created a subset of this data, which can be read as a list of dictionaries. That data is located here, and I have created a template notebook, a3.ipynb, that contains a cell that will download and read that data. You can right-click and save-as the a3.ipynb file, and, if working on tiger, upload that file. Once loaded, the data is a list of dictionaries where each dictionary has nine key-value pairs. Those keys and a brief description are:
fdc_id
: a unique identifier assigned by FoodData
Centralbrand_owner
: the company that makes the productbrand_name
: a brand name, if different from the
companydescription
: the product’s name or descriptionbranded_food_category
: the category for the food
productingredients
: a comma-separated string of ingredients in
the productserving_size
: the serving size of the product in the
units specified by serving_size_unit
serving_size_unit
: the units for the serving size
valuenutrition
: a list of dictionaries containing nutrition
information; each dictionary contains the keys name
,
amount
, and unit_type
and their associated
valuesYou will be answering queries and writing functions to help analyze this data. You may not use external libraries including statistics, collections, datetime, or pandas for this assignment.
The assignment is due at 11:59pm on Monday, September 30.
You should submit the completed notebook file required for this
assignment on Blackboard. The
filename of the notebook should be a3.ipynb
.
Please make sure to follow instructions to receive full credit. Use a markdown cell to Label each part of the assignment with the number of the section you are completing. You may put the code for each part into one or more cells.
The first cell of your notebook should be a markdown cell with a line for your name and a line for your Z-ID. If you wish to add other information (the assignment name, a description of the assignment), you may do so after these two lines.
Find all of the possible values for serving size units. List each type only once!
Write code to find the food items in the dataset with the largest
serving size among those in mililiters (serving unit type is ‘ml’).
There may be multiple items with the same maximum
serving size. Output the description and
brand_owner of each food item. Remember that you will
need to iterate through each element of the list, and each element is a
dictionary which has various keys including
serving_unit_size
and name
.
Write code to create a dictionary, category_counts
that
keeps track of how many items each food category
(branded_food_category
) has listed in our sample dataset.
Next, use this dictionary to find and display the name of the category
that has the largest number of items.
count_letters
example from class may be usefulUpdate the list of each food item’s nutrition
information to include the amount of unsaturated fat. This can be
computed by subtracting the amount of saturated fat from the amount of
total fat. You will need to add a new dictionary to the
list of nutrition information. The keys for name
and
unit_type
should be “Unsaturated Fat” and “G”,
respectively. The amount is what you are computing via the subtraction.
After computing this for all items, an item would, for example, now look
like this:
'fdc_id': 1106099,
{'brand_owner': 'Rovira Biscuit Corporation',
'brand_name': None,
'description': 'TITA CRACKERS',
'branded_food_category': 'Crackers & Biscotti',
'ingredients': 'ENRICHED WHEAT FLOUR (NIACIN, IRON, THIAMINE MONONITRATE (VITAMIN B1), RIBOFLAVIN (VITAMIN B2), FOLIC ACID), SUGAR, VEGETABLE SHORTENING (CONTAINS PARTIALLY HYDROGENATED SOYBEAN OIL, AND/OR COTTONSEED OIL, AND/OR CANOLA OIL) *ADDS A DIETARILY INSIGNIFICANT AMOUNT OF SATURATED FAT, GLUCOSE, MALT, AMMONIUM BICARBONATE, SALT, GINGER, SODIUM BICARBONATE, SODIUM SULFITE, ARTIFICIAL FLAVOR, ARTIFICIAL COLORS (YELLOW #5, YELLOW #6, RED #40).',
'serving_size': 15.0,
'serving_size_unit': 'g',
'nutrition': [{'name': 'Carbohydrates', 'amount': 80.0, 'unit_name': 'G'},
'name': 'Saturated Fat', 'amount': 0.0, 'unit_name': 'G'},
{'name': 'Calories', 'amount': 400.0, 'unit_name': 'KCAL'},
{'name': 'Protein', 'amount': 6.67, 'unit_name': 'G'},
{'name': 'Sugar', 'amount': 20.0, 'unit_name': 'G'},
{'name': 'Fiber', 'amount': 0.0, 'unit_name': 'G'},
{'name': 'Sodium', 'amount': 267.0, 'unit_name': 'MG'},
{'name': 'Total Fat', 'amount': 6.67, 'unit_name': 'G'},
{'name': 'Unsaturated Fat', 'amount': 6.67, 'unit_name': 'G'}]} {
Write a function filter_by_fiber
that
takes two arguments, min_fiber
and max_fiber
,
and returns a list of food items whose amount of fiber is in the
specified range, inclusive. For each item, you will
need to find the Fiber listing in the nutrition list. Do
not assume that item will be in a particular index of the list!
Then, test whether the item’s amount of fiber is in the specified range,
only including it in the returned list if it satisfies the condition.
For example, the list comprehension
[d['description'] for d in filter_by_fiber(7.3,7.35)]
should evaluate to:
For example,
'ATHLETE FUEL ORGANIC MUESLI',
["WILBUR'S OF MAINE, ALL NATURAL DARK CHOCOLATE CRANBERRIES",
'MULTIGRAIN PIZZA DOUGH',
'APPLE CINNAMON GRANOLA',
'VANILLA ICED LATTE CHILLED COFFEE DRINK, VANILLA',
'KODIAK CAKES, GRANOLA UNLEASHED, VERMONT MAPLE PECAN',
'PROTEIN CEREAL, OATS & HONEY',
'PEACE CEREAL, SUPERGRAINS CEREAL, MAPLE BUCKWHEAT HEMP',
'ARTISAN BLEND GRANOLA',
'HONEY ALMOND GRANOLA, HONEY ALMOND',
'HONEY ALMOND CEREAL, HONEY ALMOND']
Only CSCI 503 students need to complete this part. CSCI 490 students may complete it for extra credit.
Write a function filter_by_ingredients
that will filter
the food items by their ingredients. Specifically, given an ingredient
(e.g. “Apple”), return the food items that have that ingredient. Note
that you should do a case-insensitive comparison so the
ingredient “apple” should return food items that list “APPLE”, “Apple”,
“apple”, etc. Do not worry about “apple” also matching “pineapple” (this
is extra credit). For example,
len(filter_by_ingredients('apple')
should evaluate to
605
639
and the list comprehension
[d['description'] for d in filter_by_ingredients('saffron')]
should evaluate to:
'SHISH KABOB SEASONING',
['MANITOU TRADING COMPANY, ALL NATURAL PAELLA RICE',
'SEASONED YELLOW RICE',
'SEASONED YELLOWRICE']