Date, Time, & Location

Wednesday, February 17, 3:30pm-4:45pm, Online (Blackboard)

Overview

Test 1 will cover all material from the beginning of the semester through data transformation (Wednesday, February 10). The material will cover the assigned readings and the topics we discussed in class.

Format

  • Multiple Choice
  • Free Response
  • CS680 Students will have additional questions

Topics

  • Python
  • numpy
  • pandas
  • Data (items, attributes, attribute types, semantics, metadata)
  • Data Wrangling
  • Data Cleaning
  • Data Transformation

Readings

Assigned Readings

Referenced Papers

Free Response Example Questions

  • Given a dataset, (a) identify an item, attribute, and cell; (b) state, for each column, whether it is categorical, ordered, or quantitative.
  • In the following Python code, identify all errors.
    // print the numbers from 1 to 100
    int counter = 1 
    while counter < 100 { 
        print counter 
        counter++
    }
  • Given the following numpy array, write two different ways to index the highlighted sub-array
  • Given the following two data frames, write a sequence of pandas operations to transform the first into the second. Exact syntax is not important, explain (and be specific) if you do not recall a particular operation or function name.
name genres
0 Toy Story (1995) animation|children’s|comedy
1 Jumanji (1995) Adventure|Children’s|Fantasy
2 Grumpier Old Men (1995) COMEDY|ROMANCE
3 Waiting to Exhale (1995) Comedy|Drama
4 Father of the Bride Part II (1995) Comedy
Name Year Adventure Animation Children’s Comedy Drama Fantasy Romance
0 Toy Story 1995 0 1 1 1 0 0 0
1 Jumanji 1995 1 0 1 0 0 1 0
2 Grumpier Old Men 1995 0 0 0 1 0 0 1
3 Waiting to Exhale 1995 0 0 0 1 1 0 0
4 Father of the Bride Part II 1995 0 0 0 1 0 0 0
  • State three distinct ways in which Wrangler helps users trying wrangle raw datasets.
  • Compare Foofah’s example-based data cleaning with Wrangler’s interactive data cleaning.