Assignment 1

Goals

The goal of this assignment is to get acquainted with Python using Jupyter Notebooks.

Instructions

You may choose to work on this assignment on a hosted environment (e.g. Google Colab) or on your own local installation of Jupyter and Python. You should use Python 3.8 or higher for your work. If you choose to work locally, Anaconda is the easiest way to install and manage Python. If you work locally, you may launch Jupyter Lab either from the Navigator application or via the command-line as jupyter-lab.

In this assignment, we will analyze the Metropolitan Museum of Art Dataset which tracks information about artworks in its collection. This dataset is provided by The Metropolitan Museum of Art (aka The Met) and information about the columns is documented here. We will be working on a subset of this data, available here. You will do some analysis of this data to answer some questions about it. I have provided code to organize this data, but you may feel free to improve this rudimentary organization. Note that you may choose to organize the cells as you wish, but you must properly label each problem’s code and its solution. Use a Markdown cell to add a header denoting your work for a particular problem. Make sure to document what your answer is to the question, and make sure your code actually computes that. As the goal of this assignment is to become acquainted with core Python, do not use other libraries except for the csv and collections library.

You may start with the provided Jupyter Notebook, a1.ipynb. Download this notebook (right-click to save the link) and upload it to your Jupyter workspace (either locally or on a hosted environment). Make sure to execute the first cell in the notebook (Shift+Enter). This cell will download the data and define two variables field_names and records. The field_names variable is a string with the names of each data attribute, separated by commas. The records variable is a list of comma-delimited strings with the values of each field for each data item. The field names are

  1. Object Number: accession number
  2. Object ID: identifying number
  3. Department: curatorial department responsible for the artwork
  4. AccessionYear: year the artwork was acquired
  5. Object Name: describes the physical type of the object
  6. Title: title given to a work of art
  7. Culture: information about the culture, or people from which an object was created
  8. Period: time or time period when an object was created
  9. Object Date: year or a span of years describing the time when an artwork was designed or created
  10. Object Begin Date: machine readable date indicating the year the artwork was started or created
  11. Object End Date: machine readable date indicating the year the artwork was completed
  12. Artist Display Name: artist name in the correct order for display

Important: Accessing the data in this way requires parsing the CSV file according to rules governing double-quoted fields (fields that can have commas). It is not recommended to use the rudimentary method of splitting strings to access fields; use the csv library to parse this correctly!

Due Date

The assignment is due at 11:59pm on Monday, February 7.

Submission

You should submit the completed notebook file required for this assignment on Blackboard. The filename of the notebook should be a1.ipynb.

Details

1. Date of the Oldest Object (10 pts)

Read in the data file and store it in appropriate data structures. Then, write code that computes the date range of the oldest objects. You can use the “Object Begin Date” and “Object End Date” attributes to make the necessary comparisons, but report the more human-readable “Object Date” attribute.

Hints:
  • Consider using the csv library and its DictReader class.
  • You can do comparisons based on one attribute but need to keep track of others when that attribute is minimal.
  • You can convert a string to an integer by casting it. For example, int("81") returns an integer value of 81.

2. Number of Unique Artist Names (10 pts)

Write code that computes the number of unique artist names contained in the dataset. This is stored in the “Artist Display Name” attribute. Note that some objects have multiple artists, delimited by the | character, and you need to consider each of them as an individual name. For this part, any anonymous or unknown names should be counted.

Hints:
  • Do not consider the empty string as a name!
  • The split function for strings will be useful
  • The strip function may be useful to trim whitespace
  • Consider using a set to keep track of all the names

3. Number of Unique Non-Anonymous Artist Names (10 pts)

Now, write code that computes the number of unique, non-anonymous artist names. This means eliminating those names from Part 1 that start with “Anonymous” and “Unknown”. Note that there are artist names like “Anonymous, German, 17th century” that also are anonymous and should be removed from the number.

Hints:
  • You may want to refactor your answer for Part 1 in order to reuse parts of it for this part
  • Look at the string methods to find one that will help answer queries about the start of a string

4. Most Frequent Artist Name (10 pts)

Write code that computes the most frequent artist name. Use a function so that your effort in Part 1 is reused for this part.

Hints:
  • collections.Counter() is a good structure to help with counting.
  • Clean up the strings in the same manner as in Part 1.

5. Date of the Oldest Accession (10 pts)

Write code that computes the object(s) that was acquired first based on the “AccessionYear” attribute. Note that some accession years are not years (i.e. not a four-digit number), and for this assignment, ignore those entries. Print the Title attribute for the oldest object(s).

Hints:
  • You can convert a string to an integer by casting it. For example, int("81") returns an integer value of 81.
  • Try converting an invalid age string to an integer to see which error to catch using try-catch.