Please note that this schedule is subject to change.
Lectures
(02/01) Relational Databases
(02/13) Pandas and DuckDB
(02/22) Data Transformation
(03/20) Scalable Databases
(03/22) Scalable Databases
- Slides
- Reading:
- References:
- NewSQL, A. Pavlo, 2012.
- The Official Ten-Year Retrospective of NewSQL, A. Pavlo, 2021.
- Spanner: Google’s Globally-Distributed Database, J. C. Corbett et al., 2012.
- F1: A Distributed SQL Database That Scales, J. Shute et al., 2013.
- Spanner, TrueTime & The CAP Theorem, E. Brewer, 2017.
- A Critique of the CAP Theorem, M. Kleppmann, 2015
- Is Scalable OLTP in the Cloud a Solved Problem?, T. Ziegler et al., 2022
(03/27) Scalable Dataframes
(03/29) Scalable Dataframes
(04/05) Graph Data
- Slides
- Reading:
- References:
- Introduction to Neo4j and Graph Databases, M. D. Allen, 2019.
- Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries (preprint), M. Besta et al., 2022.
- Survey of Graph Database Models, R. Angles and C. Gutierrez, 2008.
- An Introduction to Graph Data Management, R. Angles and C. Gutierrez, 2017
- Graph Databases, D. Lembo and R. Rosati, 2015
- Introduction to Graph Databases, M. De Marzi, 2012
- The Future Is Big Graphs: A Community View on Graph Processing Systems, S. Sakr et al., 2021
- The (sorry) State of Graph Database Systems, P. Boncz, 2022
(04/12) Databases and Visualization
(04/17) Spatial Data
- Slides
- Reading:
- References:
- Big Spatial Data Management, A. Eldawy, 2020
- Data Cubes, J. Han, M. Kamber, and J. Pei, 2011.
- Nanocubes for Real-Time Exploration of Spatiotemporal Datasets, L. Lins et al., 2013.
- TopKube: A Rank-Aware Data Cube for Real-Time Exploration of Spatiotemporal Datasets, F. Miranda et al., 2017.
- Dynamic prefetching of data tiles for interactive visualization, L. Battle et al., 2016.
(04/24) Provenance
- Slides
- Reading:
- References:
- Provenance for Computational Tasks: A Survey, J. Freire et al., 2008
- Provenance in Databases: Why, How, and Where, J. Cheney et al., 2007
- Provenance in Databases, A. Amarilli, 2019.
- Capturing and querying fine-grained provenance of preprocessing pipelines in data science, P. Missier, 2023.
- Notebook: Download, View
(04/26) Reproducibility
- Slides
- Reading:
- References:
- Repeatability and Benefaction in Computer Systems Research, C. Collberg et al., 2015.
- Reproducible Research in Computational Science, R. D. Peng, 2011.
- Ten Simple Rules for Reproducible Computational Research, G. K. Sandve et al., 2013.
- Computational Reproducibility: State-of-the-Art, Challenges, and Database Research Opportunities, J. Freire et al., 2012.
- A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks, J. F. Pimentel, 2019.
(05/01) Databases and Machine Learning