Final Exam

Date, Time, & Location

Wednesday, May 8, 8:00-9:50am, PM 252

The final exam is comprehensive and will cover all material from the beginning of the semester through the end but with some focus on material covered since Test 2. The material will cover the assigned readings and the topics we discussed in class.

Format

  • Multiple Choice (20-25)
  • Free Response (5-6 questions)
  • CSCI 640 Students will have additional questions

Topics

  • Python
  • numpy
  • pandas
  • Data (items, attributes, attribute types, semantics, metadata)
  • Data Wrangling
  • Data Cleaning
  • Data Transformation
  • Data Integration
  • Data Fusion
  • Scalable Databases
  • Scalable Dataframes
  • Time Series Data
  • Graph Data
  • Databases and Visualization
  • Spatial Data
  • Data Curation
  • Provenance (Computational, Database, Evolution)
  • Reproducibility
  • Databases and Machine Learning

Readings

Assigned Readings

Referenced Papers

Free Response Example Questions

  • Examples from Test 1
  • Examples from Test 2
  • For which types of queries would we expect a graph database to provide better performance than a relational database?
  • What did Sanu et al.’s survey about graph datasets find about their sizes? What are the problems related to graph databases that Boncz discusses?
  • What is unique about RDF triple stores compared to other graph databases with respect to schema and instance?
  • Why does imMens focus on being extremely efficient in calculating visualization updates?
  • How does Mosaic produce visualizations more efficiently? (Hint: think about the number of pixels.)
  • In addition to doing pre-computation, how does ForeCache reduce interaction latency?
  • What challenges does Beast tackle with respect to spatial data processing? How does its architecture relate to other appoaches?
  • What is provenance, and what is required to capture, store, and use provenance?
  • What is the difference between prospective and retrospective provenance?
  • What are the trade-offs between workflow- and OS-based provenance capture?
  • What questions can database provenance answer? What are the differences between “Why”, “How”, and “Where” provenance?
  • What was evolution provenance in VisTrails used for?
  • What are concerns involved in reproducing a previous computational study?
  • What types of analyses can be done to evaluate how reproducible published work is?
  • How might machine learning impact databases?
  • Which type of engine (OLAP or OLTP) is SageDB being developed for? Which components of a database does SageDB present machine learning approaches for? How do they perform versus standard databases.