The goal of this assignment is to get acquainted with Python using Jupyter Notebooks.
You may choose to work on this assignment on a hosted environment
(e.g. Google Colab) or
on your own local installation of Jupyter and Python. You should use
Python 3.8 or higher for your work. If you choose to work locally, Anaconda is the easiest
way to install and manage Python. If you work locally, you may launch
Jupyter Lab either from the Navigator application or via the
command-line as jupyter-lab
.
In this assignment, we will analyze the Metropolitan Museum of
Art Dataset which tracks information about artworks in its
collection. This dataset is provided by The Metropolitan Museum of Art (aka
The Met) and information about the columns is documented here. We will be working
on a subset of this data, available here. You will do some analysis of
this data to answer some questions about it. I have provided code to
organize this data, but you may feel free to improve this rudimentary
organization. Note that you may choose to organize the cells as you
wish, but you must properly label each problem’s code and its solution.
Use a Markdown cell to add a header denoting your work for a particular
problem. Make sure to document what your answer is to the question, and
make sure your code actually computes that. As the goal of this
assignment is to become acquainted with core Python, do not use other
libraries except for the csv
and collections
library.
You may start with the provided Jupyter Notebook, a1.ipynb. Download this notebook (right-click to save the link) and
upload it to your Jupyter workspace (either locally or on a hosted
environment). Make sure to execute the first cell in the notebook
(Shift+Enter). This cell will download the data and define two variables
field_names
and records
. The
field_names
variable is a string with the names of each
data attribute, separated by commas. The records
variable
is a list of comma-delimited strings with the values of each field for
each data item. The field names are
Object Number
: accession numberObject ID
: identifying numberDepartment
: curatorial department responsible for the
artworkAccessionYear
: year the artwork was acquiredObject Name
: describes the physical type of the
objectTitle
: title given to a work of artCulture
: information about the culture, or people from
which an object was createdPeriod
: time or time period when an object was
createdObject Date
: year or a span of years describing the
time when an artwork was designed or createdObject Begin Date
: machine readable date indicating the
year the artwork was started or createdObject End Date
: machine readable date indicating the
year the artwork was completedArtist Display Name
: artist name in the correct order
for displayImportant: Accessing the data in this way requires parsing the CSV file according to rules governing double-quoted fields (fields that can have commas). It is not recommended to use the rudimentary method of splitting strings to access fields; use the csv library to parse this correctly!
The assignment is due at 11:59pm on Monday, February 7.
You should submit the completed notebook file required for this
assignment on Blackboard. The
filename of the notebook should be a1.ipynb
.
Read in the data file and store it in appropriate data structures. Then, write code that computes the date range of the oldest objects. You can use the “Object Begin Date” and “Object End Date” attributes to make the necessary comparisons, but report the more human-readable “Object Date” attribute.
csv
library and its DictReader
class.int("81")
returns an integer value of 81.Write code that computes the number of unique artist names contained
in the dataset. This is stored in the “Artist Display Name” attribute.
Note that some objects have multiple artists, delimited
by the |
character, and you need to consider each of them
as an individual name. For this part, any anonymous or unknown names
should be counted.
split
function for strings will be usefulstrip
function may be useful to trim
whitespaceset
to keep track of all the
namesNow, write code that computes the number of unique, non-anonymous artist names. This means eliminating those names from Part 1 that start with “Anonymous” and “Unknown”. Note that there are artist names like “Anonymous, German, 17th century” that also are anonymous and should be removed from the number.
Write code that computes the most frequent artist name. Use a function so that your effort in Part 1 is reused for this part.
collections.Counter()
is a good structure to help with counting.Write code that computes the object(s) that was acquired first based on the “AccessionYear” attribute. Note that some accession years are not years (i.e. not a four-digit number), and for this assignment, ignore those entries. Print the Title attribute for the oldest object(s).
int("81")
returns an integer value of 81.