Assignment 5

Goals

Interaction and Linked Views

Instructions

In this assignment, we will be working with interactions and linked views. Visualizations may be created using Observable Plot or D3. You may use other libraries (e.g. lodash.js or jQuery) for non-visualization tasks, but you must credit them in the HTML file you turn in. Observable Plot has documentation and examples For D3, there is extensive documentation available as well as examples, and Vadim Ogievetsky’s example-based introduction that we went through in class is also a useful reference. Our in-class example showing linked highlighting will also be useful.

Due Date

The assignment is due at 11:59pm on Wednesday, April 23.

Submission

You should submit any files required for this assignment on Blackboard. If you use Observable, submit the .tar.gz or .tgz file that is generated from the export menu and rename it to a5.tar.gz or a5.tgz. If you create your own files, please make sure the filename of the main HTML document is a5.html. Any other files should be linked to the main HTML document accordingly relatively. Blackboard may complain about individual files; if so, please zip the files and submit the zip file instead.

Details

In this assignment, we will examine data from the Citi Bike System in New York City. Each bike rental is logged with its start station, end station, and trip duration. We will examine a subset of these trips to examine patterns between stations across the city, aggregating the individual trips into routes. We will use both an directed node-link diagram and adjacency matrix visualizations. We will use filtering and linked highlighting to help deal with the large amount of data.

Data

The trip data contains information about an individual bike rental, including:

  • start_station_id: the start station’s identifier
  • end_station_id: the end station’s identifier
  • duration: the amount of time the trip took
  • started_at: the start timestamp
  • ended_at: the end timestamp

The stations data contains information about each location, including:

  • station_id: station identifier
  • name: station name
  • lat: station latitude
  • lon: station longitude
  • district the identifier of the NYC community district where the station is located

The community district boundaries is a GeoJSON file where each feature is a district and has the following properties:

  • district: the district identifier
  • centroid_lat: the latitude of the centroid of the district
  • centroid_lon: the longitude of the centroid of the district

0. Info

Make sure to include the following information in your notebook or main html file:

  • Your name
  • Your student id
  • The course title (“Data Visualization (CSCI 627/490)”), and
  • The assignment title (“Assignment 5”)

If you used any additional JavaScript libraries or code references, please append a note to this section indicating their usage to the text above (e.g. “I used the Lodash library to partition an array.”) Include links to the projects used. You do not need to adhere to any particular style for this text, but I would suggest using headings to separate the sections of the assignment.

1. Initial Visualization

I have created an Observable notebook (need to be in NIU Team) that you can fork to begin. (If you wish to use raw HTML/JS instead of a notebook, you can copy the Plot code from the notebook.) This notebook contains some data processing as well as two visualizations. The data processing loads the data and creates a lookup for stations and aggregates the trips into objects that correspond to the arrows, having start and end station ids as well as the count of the number of trips. The first visualization is a map showing the community districts with a directed node-link diagram that encodes the number of trips between any pair of stations using stroke width. The second visualization is an adjacency matrix that encodes if there are any trips between any pair of stations.

a. Multiple View Visualization (5 pts)

Combine these two visualizations into a single visualization with the two views juxtaposed horizontally. Consult the in-class notebook from lecture if you are not sure how to do this.

Hints
  • In Observable, you can do this in an HTML cell (give it a name) or a JavaScript cell using html templates. Remember that this will detach the original visualizations.
  • Remember the set CSS rules to get these views next to each other horizontally

b. Encode Count using Color (5 pts)

Update the adjacency matrix to encode the count of trips between the start and end station using an appropriate color scale. Add a legend.

Hints
  • Remember to use the appropriate option for the cell mark to encode the count
  • You can change the color scheme or type to something appropriate. This notebook may be helpful.

2. Filtering (15 pts)

You may have noticed that it can be rather difficult to see the individual trips in the node-link diagram due to the number of arrows. Add a slider to set a threshold for the number of trips an arrow must have to be shown. Filter out any arrows that have fewer than that number of trips. You can use Observable’s Inputs to create a slider. If you are using Observable, you can rely on reactive execution to update the visualization (referencing the threshold value in visualization). Otherwise, you will need to update the display of the arrows for those marks with counts below the threshold. For Part 3, it will work better to find a way to hide the arrow rather than remove it from the visualization!

Hints
  • Use a reasonable step (must at least be integral!)
  • You cannot easily integrate the widget into the multiple view cell, but by placing this widget directly below the visualization, it can be close enough to integrate with the rest of the visualization.
  • Think about how you can update the subset of arrows so that they are not visible; are there any visual properties that we can use here?

3. Linked Highlighting

It can be difficult to match the station names in the adjacency matrix with their locations on the map. We will use linked highlighting from the matrix to the node-link diagram. If the pointer is over a particular cell, both that cell and the corresponding stations and arrow in the node-link diagram should be highlighted.

a. Highlight Selected Cell (5 pts)

Add event handlers to the cells to highlight the currently selected cell. Think about a good way to do this that doesn’t interfere with the existing encoding (remember we added a fill color in Part 1b).

Hints:
  • In a pointerover or pointerout event handler, you can get the selected node as event.currentTarget and can treat it as a D3 selection using d3.select(event.currentTarget).
  • Remember to remove highlights from cells other than the selected one

b. Highlight Start & End Stations (10 pts)

Now, use the information from the currently selected cell to highlight the start and end station for this edge in the node-link diagram. Unfortunately, Observable Plot, unlike D3, does not set the data or attributes on cells or dots. However, we have used the z attribute to order the elements according to the order in tripCounts. Thus, we can bind the data to the marks ourselves using D3. Plot has a className property that adds the specified class to the div element containing the plot marks. We have specified the class for station dots as stations, the class for arrows as arrows, and the class for cells as edges. Then, we just need to know the types of elements we are going to bind for our selectAll calls. The dot mark creates circle elements, the arrow mark creates path elements, and the cell mark creates rect elements. For example, to select the cells, we have d3.select(adjMatrix).selectAll(".edges rect"). Now, we need to bind the data. Note that the stations are created from the stations data while the other two marks are created from the tripCounts. Once the data is bound to the elements, we can use D3’s methods to extract information or update attributes.

Hints:
  • The D3 data binding discussed above is important here. We can filter a selection (the points), but we need the correct boolean expression.
  • Remember the cell data stores the start_station_id and end_station_id while the dot data stores just a station_id.
  • If you use a CSS rule for highlighting, look at D3’s classed function for adding and removing a class.

c. Highlight Arrow (15 pts)

Finally, we want to highlight the arrow corresponding to the selected cell. However, we also want to show an arrow if it is currently filtered out (Part 2). One option is to add a class and use CSS to change the style. However, this will still leave the arrow below others and changing the drawing order could potentially cause issues. Another option is to add a new arrow that basically copies the (potentially hidden) arrow and restyles it. Remember that we can grab any attribute from an existing graphical element using attr so for the arrow, we can grab its definition d and append a new path to the map that highlights it. We can also remove that path when the highlighted arrow is no longer needed.

4. Aggregation

There is still a bit too much data for the visualization. Let’s create a new visualization that aggregates the trips by district. You can start from the original visualizations, but remember to give them new identifiers. We wish to change our adjacency matrix to aggregate by district, and our linked highlighting should highlight all arrows that correspond to the specified district connections.

a. District Matrix (15 pts)

We want a new matrix view that shows the districts and sums all trips between those districts. Note that while we excluded trips that started and ended at the same station, we can now have trips that start and end in the same district (the diagonal will not be empty). Create a new array (tripDistrictCounts) that specifies the counts between start and end districts and use that to draw a new matrix. Add a color mapping as before.

Hints
  • The d3.flatRollup will be useful here, too
  • Use the stationLookup to map from station id to district.
  • Update margins to reflect changes in labels.

b. [627 Only] Linked Highlighting (20 pts)

Finally, we want to create a second multiple view visualization using the same map as before (with a new identifier) and the district matrix we just created. Then, add linked highlighting so that selections in the district matrix highlight all arrows in the map that correspond to paths between stations in those districts.

Hints
  • You will need to map the station ids through the lookup to determine what their districts are (or add these to the data).
  • If two stations are in the same district, both the to and from arrows will be highlighted.

Extra Credit

  • CSCI 490 students may do Part 4b for extra credit
  • Add linked highlighting from the stations in the map to the cells in the adjacency matrix (this will be one-to-many)
  • Create a new aggregated node-link diagram that uses the centroid information from the districts to draw arrows between districts, not stations. Make sure the counts are summed properly.