# Color

* Authors: David Koop
* Last Updated: 2025-03-19

## Introduction

Color is an important ingredient in data visualization because it is often separable from the other channels (or visual variables). However, color itself can be decomposed into three different channels or its own:

* color hue: different colors (e.g. red, orange, blue)
* color saturation: the vividness of the color (e.g. a scale a drab white or gray to a bright red)
* color lightness/luminance: the brightness of the color


## Installing the required libraries

These notebooks are written in Python, although the visualizations are generated with the help of a JavaScript library. In this courselet, we will be using [pyobsplot](https://juba.github.io/pyobsplot/), a Python interface to the Observable Plot JavaScript library, for creating our data visualizations. In addition, we will be using [polars](https://pola.rs), a library for data manipulation, to load and access the data. In order to create charts in this courselet, it is necessary to install both of the libraries. To install them, run the following command in your Python environment:
```sh
pip install pyobsplot polars
```
You may also use other tools like [uv](https://docs.astral.sh/uv/) or [conda](https://github.com/conda/conda) to install these libraries. The next line configures Jupyter to show output for assignments or expressions.

In [94]:
%config InteractiveShell.ast_node_interactivity = 'last_expr_or_assign'


## Human Color Perception

Recall, however, that the goal of visualization is helping humans understand data. Thus, we need to understand a bit about humans see color and potential problems or anomolies with this approach before we can apply it to visualizations. Physically, color can be viewed as a spectra of the visual light portion of the electromagnetic spectrum. Tools like a spectrophotometer can measure the power for individual wavelengths to classify light. In the physical world, color is usually associated with a material instead of the light itself: a brown sofa, a red chair, a yellow pencil. These colors are the result of the material *absorbing* some of the wavelengths and reflecting others which are captured by the eye.

The retina is the part of the human eye which senses the wavelengths of light and turns this information into impulses that the brain further interprets. The retina has two types of cells that sense light: rods and cones. The rods sense the intensity of light while the cones help us differentiate colors. Cones are more common in the central part of the retina, while rods outnumber cones along the outside. Thus, human peripheral vision does not sense color well. Humans have three types of cones: S(hort), M(edium), and L(ong), and these names indicate the wavelengths they are sensitive to. The M and L cones capture much of the same part of the spectrum with the L capturing more of the longer (red) wavelenths and M capturing more of the shorter (green) wavelengths. Each cone produces a response matching the intensity of the wavelengths it is sensitive to. To the brain, color is thus a triple of responses, and thus our judgment of color is not very precise; we perceive different spectral distributions as the same color.

### Colorblindness

When a type of cone is missing or anamolous, it becomes even more difficult to perceive different spectral distributions. The most common form of colorblindness, deuteranomaly, occurs when the M cones are not as sensitive as they are in other humans. This makes it difficult to distinguish reds from greens. As an X-linked recessive trait, it tends to affect men more often than women. When we are encoding information using color, it is important to watch out for colormaps that may be difficult for some of the population to perceive.

### Other Factors Influencing Perception

In addition to colorblindness and the limits of the human eye, there are also a number of other factors influencing color perception that occur in data visualizations ([Szafir, 2007](https://danielleszafir.com/colordiff_vis2017.pdf)). We are likely to view visualizations on different materials, on screens with different gamuts, and under different external lighting characteristics. Each of these can influence how well we perceive color. In addition, visualizations often use color on many marks of different shapes and sizes. As with Cleveland and McGill's findings about the impact of the locations of bars whose lengths are being compared, color is also influenced by other marks and their colors, including backgrounds. Simultaneous contrast, in particular, is often a problem with adjacent colors. Furthermore, larger or thicker visual elements tend to allow better color judgment than smaller or thinner marks. 

## Colormaps

Recall that our goal with color is to translate values into colors. We will use the [Travel & Tourism Development Index dataset](https://www.weforum.org/publications/travel-tourism-development-index-2024/downloads-d72ace2079/) to investigate colormaps. The next cell loads that data and wrangles it into a dataframe that we can work with for visualization.

In [97]:
# This code is included for completeness, but you can also use the CSV file below
# This code requies fastexcel to be installed

import polars as pl
from itertools import accumulate
import urllib.request
import os


url = "https://www3.weforum.org/docs/WEF_TTDI_2024_edition_data.xlsx"
local_fname = "WEF_TTDI_2024_edition_data.xlsx"
if not os.path.exists(local_fname):
    urllib.request.urlretrieve(url, local_fname)

df_header = [
    ""
    if c.startswith("__UNNAMED__")
    else c.strip().replace("\n", " ").replace("\r", "")
    for c in pl.read_excel(
        local_fname, read_options={"n_rows": 1}
    ).columns
]

df_header = list(accumulate(df_header, lambda x, y: y if y else x))

df = pl.read_excel(
    local_fname, read_options={"header_row": 1}
)

df = df.rename(
    {
        c: d + ": " + c[: idx if (idx := c.rfind("_")) > 0 else None] if d else c
        for c, d in zip(df.columns, df_header)
    }
)

ISO Code,Economy,Region,Sub Region,Income Group,Travel & Tourism Development Index: 2019 Value,Travel & Tourism Development Index: 2019 Rank,Travel & Tourism Development Index: 2021 Value,Travel & Tourism Development Index: 2021 Rank,Travel & Tourism Development Index: 2024 Value,Travel & Tourism Development Index: 2024 Rank,Travel & Tourism Development Index: 2021-2024 % Dif Score,Travel & Tourism Development Index: 2021-2024 Rank Change,Travel & Tourism Development Index: 2019-2024 % Dif Score,Travel & Tourism Development Index: 2019-2024 Rank Change,Enabling Environment dimension: 2019 Value,Enabling Environment dimension: 2019 Rank,Enabling Environment dimension: 2021 Value,Enabling Environment dimension: 2021 Rank,Enabling Environment dimension: 2024 Value,Enabling Environment dimension: 2024 Rank,Enabling Environment dimension: 2021-2024 % Dif Score,Enabling Environment dimension: 2021-2024 Rank Change,Enabling Environment dimension: 2019-2024 % Dif Score,Enabling Environment dimension: 2019-2024 Rank Change,Travel and Tourism Policy and Enabling Conditions dimension: 2019 Value,Travel and Tourism Policy and Enabling Conditions dimension: 2019 Rank,Travel and Tourism Policy and Enabling Conditions dimension: 2021 Value,Travel and Tourism Policy and Enabling Conditions dimension: 2021 Rank,Travel and Tourism Policy and Enabling Conditions dimension: 2024 Value,Travel and Tourism Policy and Enabling Conditions dimension: 2024 Rank,Travel and Tourism Policy and Enabling Conditions dimension: 2021-2024 % Dif Score,Travel and Tourism Policy and Enabling Conditions dimension: 2021-2024 Rank Change,Travel and Tourism Policy and Enabling Conditions dimension: 2019-2024 % Dif Score,Travel and Tourism Policy and Enabling Conditions dimension: 2019-2024 Rank Change,Infrastructure and Services dimension: 2019 Value,Infrastructure and Services dimension: 2019 Rank,…,Non-Leisure Resources pillar: 2021 Rank,Non-Leisure Resources pillar: 2024 Value,Non-Leisure Resources pillar: 2024 Rank,Non-Leisure Resources pillar: 2021-2024 % Dif Score,Non-Leisure Resources pillar: 2021-2024 Rank Change,Non-Leisure Resources pillar: 2019-2024 % Dif Score,Non-Leisure Resources pillar: 2019-2024 Rank Change,Environmental Sustainability pillar: 2019 Value,Environmental Sustainability pillar: 2019 Rank,Environmental Sustainability pillar: 2021 Value,Environmental Sustainability pillar: 2021 Rank,Environmental Sustainability pillar: 2024 Value,Environmental Sustainability pillar: 2024 Rank,Environmental Sustainability pillar: 2021-2024 % Dif Score,Environmental Sustainability pillar: 2021-2024 Rank Change,Environmental Sustainability pillar: 2019-2024 % Dif Score,Environmental Sustainability pillar: 2019-2024 Rank Change,T&T Socioeconomic Impact pillar: 2019 Value,T&T Socioeconomic Impact pillar: 2019 Rank,T&T Socioeconomic Impact pillar: 2021 Value,T&T Socioeconomic Impact pillar: 2021 Rank,T&T Socioeconomic Impact pillar: 2024 Value,T&T Socioeconomic Impact pillar: 2024 Rank,T&T Socioeconomic Impact pillar: 2021-2024 % Dif Score,T&T Socioeconomic Impact pillar: 2021-2024 Rank Change,T&T Socioeconomic Impact pillar: 2019-2024 % Dif Score,T&T Socioeconomic Impact pillar: 2019-2024 Rank Change,T&T Demand Sustainability pillar: 2019 Value,T&T Demand Sustainability pillar: 2019 Rank,T&T Demand Sustainability pillar: 2021 Value,T&T Demand Sustainability pillar: 2021 Rank,T&T Demand Sustainability pillar: 2024 Value,T&T Demand Sustainability pillar: 2024 Rank,T&T Demand Sustainability pillar: 2021-2024 % Dif Score,T&T Demand Sustainability pillar: 2021-2024 Rank Change,T&T Demand Sustainability pillar: 2019-2024 % Dif Score,T&T Demand Sustainability pillar: 2019-2024 Rank Change
str,str,str,str,str,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,…,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64
"""SGP""","""Singapore""","""Asia-Pacific""","""South-East Asia""","""High-Income Economies""",4.836993,11,4.859702,9,4.755449,13,-0.021452,-4,-0.016858,-2,5.929148,4,5.948873,4,5.974759,5,0.004351,-1,0.007693,-1,4.892797,19,5.217416,3,4.693707,25,-0.100377,-22,-0.04069,-6,5.839716,2,…,13,3.796555,25,-0.144806,-12,-0.226757,-14,4.210384,55,4.374555,52,4.373042,55,-0.000346,-3,0.038633,0,3.535036,95,4.068493,79,4.379867,56,0.076533,23,0.238988,39,3.754416,71,3.946615,63,4.13518,39,0.047779,24,0.101418,32
"""ISL""","""Iceland""","""Europe and Eurasia""","""Northern Europe""","""High-Income Economies""",4.475579,22,4.397822,28,4.322122,32,-0.017213,-4,-0.034288,-10,5.818496,9,5.880801,8,5.897454,8,0.002832,0,0.01357,1,4.152698,97,4.068322,100,3.929689,100,-0.034076,0,-0.053702,-3,4.92679,18,…,97,1.329387,97,0.007476,0,0.016199,1,5.438007,12,5.504294,13,5.229237,25,-0.049972,-12,-0.038391,-13,5.463744,3,4.869084,25,4.467003,51,-0.082578,-26,-0.182428,-48,3.013654,107,3.069841,110,2.57638,117,-0.160745,-7,-0.145098,-10
"""SLV""","""El Salvador""","""The Americas""","""North and Central America""","""Upper-Middle-Income Economies""",3.301555,101,3.371741,101,3.433428,97,0.018295,4,0.039943,4,3.782416,98,4.007048,92,4.124784,88,0.029382,4,0.090516,10,4.365426,76,4.221123,90,4.161186,81,-0.014199,9,-0.046786,-5,2.406444,96,…,84,1.504815,85,0.016357,-1,0.018512,1,4.035646,69,4.216208,57,4.282782,61,0.01579,-4,0.061238,8,4.304284,56,4.388411,63,4.36604,57,-0.005098,6,0.014348,-1,4.157887,41,4.569863,25,4.696573,8,0.027727,17,0.129558,33
"""LKA""","""Sri Lanka""","""Asia-Pacific""","""South Asia""","""Lower-Middle Income Economies""",3.693238,75,3.733799,74,3.693234,76,-0.010864,-2,-0.000001,-1,4.206861,74,4.235558,76,4.210504,83,-0.005915,-7,0.000866,-9,4.763821,34,4.823758,29,4.721661,22,-0.021165,7,-0.00885,12,2.98985,76,…,73,1.607666,74,-0.016352,-1,-0.048697,-1,3.654453,104,3.667008,107,3.704321,109,0.010175,-2,0.013646,-5,4.154055,66,4.941916,21,5.843734,4,0.182484,17,0.406754,62,4.715562,12,4.546933,27,3.697613,67,-0.18679,-40,-0.21587,-55
"""TZA""","""Tanzania""","""Sub-saharan Africa""","""Eastern Africa""","""Lower-Middle Income Economies""",3.493018,88,3.648607,82,3.649434,81,0.000227,1,0.04478,7,3.45201,109,3.641029,105,3.711937,105,0.019475,0,0.075297,4,4.841353,26,4.84995,28,4.772438,19,-0.015982,9,-0.014235,7,2.364619,98,…,77,1.558629,78,0.002325,-1,0.001577,2,4.068672,63,4.199981,59,4.170733,68,-0.006964,-9,0.025085,-5,4.18009,64,5.212514,10,5.395485,11,0.035102,-1,0.290758,53,4.399329,25,4.27381,43,4.01647,49,-0.060213,-6,-0.087027,-24
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
"""SRB""","""Serbia""","""Europe and Eurasia""","""Balkans and Eastern Europe""","""Upper-Middle-Income Economies""",3.755654,71,3.811368,69,3.855565,68,0.011596,1,0.026603,3,4.990329,41,5.137243,42,5.244002,44,0.020781,-2,0.050833,-3,4.548525,60,4.506608,56,4.390009,56,-0.025873,0,-0.03485,4,3.057354,72,…,69,1.716473,70,-0.002088,-1,0.003257,2,4.003454,75,4.031713,74,4.087813,73,0.013915,1,0.021072,2,4.124661,68,4.250109,69,4.306085,63,0.01317,6,0.043985,5,3.257676,94,3.377893,92,3.390168,80,0.003634,12,0.040671,14
"""ARE""","""United Arab Emirates""","""Middle East and North Africa""","""Middle East""","""High-Income Economies""",4.42493,25,4.511856,20,4.61942,18,0.02384,2,0.043953,7,5.50624,20,5.588746,21,5.541326,27,-0.008485,-6,0.006372,-7,4.741747,38,4.878847,24,5.11031,3,0.047442,21,0.077727,35,5.264435,10,…,31,3.838118,24,0.13784,7,0.076607,6,4.144525,58,4.114121,68,4.056958,76,-0.013894,-8,-0.021128,-18,2.230788,119,2.520497,118,2.672358,118,0.060251,0,0.197944,1,4.09886,44,4.274573,42,3.920546,55,-0.082822,-13,-0.043503,-11
"""LBN""","""Lebanon""","""Middle East and North Africa""","""Middle East""","""Upper-Middle-Income Economies""",3.635733,80,3.572115,86,3.664921,79,0.025981,7,0.008028,1,4.120879,79,4.198452,79,4.188765,85,-0.002307,-6,0.016474,-6,4.619079,48,4.071353,98,4.454433,52,0.094092,46,-0.035645,-4,3.372416,58,…,42,1.855631,67,-0.36207,-25,-0.442017,-34,3.764705,95,3.750659,98,3.723331,108,-0.007286,-10,-0.01099,-13,3.681601,84,4.570013,47,5.280241,17,0.15541,30,0.434224,67,3.723672,72,3.682823,80,3.757319,64,0.020228,16,0.009036,8
"""BHR""","""Bahrain""","""Middle East and North Africa""","""Middle East""","""High-Income Economies""",3.881952,61,3.954701,59,3.962452,58,0.00196,1,0.020737,3,4.860262,53,4.944445,52,4.963125,52,0.003778,0,0.021164,1,4.16293,96,4.479593,63,4.388536,57,-0.020327,6,0.054194,39,4.183603,35,…,40,2.423291,49,-0.180943,-9,-0.202196,-7,3.208747,118,3.271194,117,3.505167,115,0.071525,2,0.092379,3,3.615561,89,3.844494,87,3.977414,83,0.034574,4,0.100082,6,4.452388,22,4.68249,18,4.39532,24,-0.061328,-6,-0.012817,-2


In [95]:
import os
from urllib.request import urlretrieve

url = "https://gist.githubusercontent.com/dakoop/fa66c4c3e808f12af8081f0185266c9d/raw/edbb42c302f58956d8e8c81cd67c5cc283d58722/WEF_TTDI_2024_edition_data.csv"
local_fname = "WEF_TTDI_2024_edition_data.csv"
if not os.path.exists(local_fname):
    urlretrieve(url, local_fname)

In [96]:
df = pl.read_csv("WEF_TTDI_2024_edition_data.csv")

ISO Code,Economy,Region,Sub Region,Income Group,Travel & Tourism Development Index: 2019 Value,Travel & Tourism Development Index: 2019 Rank,Travel & Tourism Development Index: 2021 Value,Travel & Tourism Development Index: 2021 Rank,Travel & Tourism Development Index: 2024 Value,Travel & Tourism Development Index: 2024 Rank,Travel & Tourism Development Index: 2021-2024 % Dif Score,Travel & Tourism Development Index: 2021-2024 Rank Change,Travel & Tourism Development Index: 2019-2024 % Dif Score,Travel & Tourism Development Index: 2019-2024 Rank Change,Enabling Environment dimension: 2019 Value,Enabling Environment dimension: 2019 Rank,Enabling Environment dimension: 2021 Value,Enabling Environment dimension: 2021 Rank,Enabling Environment dimension: 2024 Value,Enabling Environment dimension: 2024 Rank,Enabling Environment dimension: 2021-2024 % Dif Score,Enabling Environment dimension: 2021-2024 Rank Change,Enabling Environment dimension: 2019-2024 % Dif Score,Enabling Environment dimension: 2019-2024 Rank Change,Travel and Tourism Policy and Enabling Conditions dimension: 2019 Value,Travel and Tourism Policy and Enabling Conditions dimension: 2019 Rank,Travel and Tourism Policy and Enabling Conditions dimension: 2021 Value,Travel and Tourism Policy and Enabling Conditions dimension: 2021 Rank,Travel and Tourism Policy and Enabling Conditions dimension: 2024 Value,Travel and Tourism Policy and Enabling Conditions dimension: 2024 Rank,Travel and Tourism Policy and Enabling Conditions dimension: 2021-2024 % Dif Score,Travel and Tourism Policy and Enabling Conditions dimension: 2021-2024 Rank Change,Travel and Tourism Policy and Enabling Conditions dimension: 2019-2024 % Dif Score,Travel and Tourism Policy and Enabling Conditions dimension: 2019-2024 Rank Change,Infrastructure and Services dimension: 2019 Value,Infrastructure and Services dimension: 2019 Rank,…,Non-Leisure Resources pillar: 2021 Rank,Non-Leisure Resources pillar: 2024 Value,Non-Leisure Resources pillar: 2024 Rank,Non-Leisure Resources pillar: 2021-2024 % Dif Score,Non-Leisure Resources pillar: 2021-2024 Rank Change,Non-Leisure Resources pillar: 2019-2024 % Dif Score,Non-Leisure Resources pillar: 2019-2024 Rank Change,Environmental Sustainability pillar: 2019 Value,Environmental Sustainability pillar: 2019 Rank,Environmental Sustainability pillar: 2021 Value,Environmental Sustainability pillar: 2021 Rank,Environmental Sustainability pillar: 2024 Value,Environmental Sustainability pillar: 2024 Rank,Environmental Sustainability pillar: 2021-2024 % Dif Score,Environmental Sustainability pillar: 2021-2024 Rank Change,Environmental Sustainability pillar: 2019-2024 % Dif Score,Environmental Sustainability pillar: 2019-2024 Rank Change,T&T Socioeconomic Impact pillar: 2019 Value,T&T Socioeconomic Impact pillar: 2019 Rank,T&T Socioeconomic Impact pillar: 2021 Value,T&T Socioeconomic Impact pillar: 2021 Rank,T&T Socioeconomic Impact pillar: 2024 Value,T&T Socioeconomic Impact pillar: 2024 Rank,T&T Socioeconomic Impact pillar: 2021-2024 % Dif Score,T&T Socioeconomic Impact pillar: 2021-2024 Rank Change,T&T Socioeconomic Impact pillar: 2019-2024 % Dif Score,T&T Socioeconomic Impact pillar: 2019-2024 Rank Change,T&T Demand Sustainability pillar: 2019 Value,T&T Demand Sustainability pillar: 2019 Rank,T&T Demand Sustainability pillar: 2021 Value,T&T Demand Sustainability pillar: 2021 Rank,T&T Demand Sustainability pillar: 2024 Value,T&T Demand Sustainability pillar: 2024 Rank,T&T Demand Sustainability pillar: 2021-2024 % Dif Score,T&T Demand Sustainability pillar: 2021-2024 Rank Change,T&T Demand Sustainability pillar: 2019-2024 % Dif Score,T&T Demand Sustainability pillar: 2019-2024 Rank Change
str,str,str,str,str,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,…,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64,f64,i64
"""SGP""","""Singapore""","""Asia-Pacific""","""South-East Asia""","""High-Income Economies""",4.836993,11,4.859702,9,4.755449,13,-0.021452,-4,-0.016858,-2,5.929148,4,5.948873,4,5.974759,5,0.004351,-1,0.007693,-1,4.892797,19,5.217416,3,4.693707,25,-0.100377,-22,-0.04069,-6,5.839716,2,…,13,3.796555,25,-0.144806,-12,-0.226757,-14,4.210384,55,4.374555,52,4.373042,55,-0.000346,-3,0.038633,0,3.535036,95,4.068493,79,4.379867,56,0.076533,23,0.238988,39,3.754416,71,3.946615,63,4.13518,39,0.047779,24,0.101418,32
"""ISL""","""Iceland""","""Europe and Eurasia""","""Northern Europe""","""High-Income Economies""",4.475579,22,4.397822,28,4.322122,32,-0.017213,-4,-0.034288,-10,5.818496,9,5.880801,8,5.897454,8,0.002832,0,0.01357,1,4.152698,97,4.068322,100,3.929689,100,-0.034076,0,-0.053702,-3,4.92679,18,…,97,1.329387,97,0.007476,0,0.016199,1,5.438007,12,5.504294,13,5.229237,25,-0.049972,-12,-0.038391,-13,5.463744,3,4.869084,25,4.467003,51,-0.082578,-26,-0.182428,-48,3.013654,107,3.069841,110,2.57638,117,-0.160745,-7,-0.145098,-10
"""SLV""","""El Salvador""","""The Americas""","""North and Central America""","""Upper-Middle-Income Economies""",3.301555,101,3.371741,101,3.433428,97,0.018295,4,0.039943,4,3.782416,98,4.007048,92,4.124784,88,0.029382,4,0.090516,10,4.365426,76,4.221123,90,4.161186,81,-0.014199,9,-0.046786,-5,2.406444,96,…,84,1.504815,85,0.016357,-1,0.018512,1,4.035646,69,4.216208,57,4.282782,61,0.01579,-4,0.061238,8,4.304284,56,4.388411,63,4.36604,57,-0.005098,6,0.014348,-1,4.157887,41,4.569863,25,4.696573,8,0.027727,17,0.129558,33
"""LKA""","""Sri Lanka""","""Asia-Pacific""","""South Asia""","""Lower-Middle Income Economies""",3.693238,75,3.733799,74,3.693234,76,-0.010864,-2,-0.000001,-1,4.206861,74,4.235558,76,4.210504,83,-0.005915,-7,0.000866,-9,4.763821,34,4.823758,29,4.721661,22,-0.021165,7,-0.00885,12,2.98985,76,…,73,1.607666,74,-0.016352,-1,-0.048697,-1,3.654453,104,3.667008,107,3.704321,109,0.010175,-2,0.013646,-5,4.154055,66,4.941916,21,5.843734,4,0.182484,17,0.406754,62,4.715562,12,4.546933,27,3.697613,67,-0.18679,-40,-0.21587,-55
"""TZA""","""Tanzania""","""Sub-saharan Africa""","""Eastern Africa""","""Lower-Middle Income Economies""",3.493018,88,3.648607,82,3.649434,81,0.000227,1,0.04478,7,3.45201,109,3.641029,105,3.711937,105,0.019475,0,0.075297,4,4.841353,26,4.84995,28,4.772438,19,-0.015982,9,-0.014235,7,2.364619,98,…,77,1.558629,78,0.002325,-1,0.001577,2,4.068672,63,4.199981,59,4.170733,68,-0.006964,-9,0.025085,-5,4.18009,64,5.212514,10,5.395485,11,0.035102,-1,0.290758,53,4.399329,25,4.27381,43,4.01647,49,-0.060213,-6,-0.087027,-24
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
"""SRB""","""Serbia""","""Europe and Eurasia""","""Balkans and Eastern Europe""","""Upper-Middle-Income Economies""",3.755654,71,3.811368,69,3.855565,68,0.011596,1,0.026603,3,4.990329,41,5.137243,42,5.244002,44,0.020781,-2,0.050833,-3,4.548525,60,4.506608,56,4.390009,56,-0.025873,0,-0.03485,4,3.057354,72,…,69,1.716473,70,-0.002088,-1,0.003257,2,4.003454,75,4.031713,74,4.087813,73,0.013915,1,0.021072,2,4.124661,68,4.250109,69,4.306085,63,0.01317,6,0.043985,5,3.257676,94,3.377893,92,3.390168,80,0.003634,12,0.040671,14
"""ARE""","""United Arab Emirates""","""Middle East and North Africa""","""Middle East""","""High-Income Economies""",4.42493,25,4.511856,20,4.61942,18,0.02384,2,0.043953,7,5.50624,20,5.588746,21,5.541326,27,-0.008485,-6,0.006372,-7,4.741747,38,4.878847,24,5.11031,3,0.047442,21,0.077727,35,5.264435,10,…,31,3.838118,24,0.13784,7,0.076607,6,4.144525,58,4.114121,68,4.056958,76,-0.013894,-8,-0.021128,-18,2.230788,119,2.520497,118,2.672358,118,0.060251,0,0.197944,1,4.09886,44,4.274573,42,3.920546,55,-0.082822,-13,-0.043503,-11
"""LBN""","""Lebanon""","""Middle East and North Africa""","""Middle East""","""Upper-Middle-Income Economies""",3.635733,80,3.572115,86,3.664921,79,0.025981,7,0.008028,1,4.120879,79,4.198452,79,4.188765,85,-0.002307,-6,0.016474,-6,4.619079,48,4.071353,98,4.454433,52,0.094092,46,-0.035645,-4,3.372416,58,…,42,1.855631,67,-0.36207,-25,-0.442017,-34,3.764705,95,3.750659,98,3.723331,108,-0.007286,-10,-0.01099,-13,3.681601,84,4.570013,47,5.280241,17,0.15541,30,0.434224,67,3.723672,72,3.682823,80,3.757319,64,0.020228,16,0.009036,8
"""BHR""","""Bahrain""","""Middle East and North Africa""","""Middle East""","""High-Income Economies""",3.881952,61,3.954701,59,3.962452,58,0.00196,1,0.020737,3,4.860262,53,4.944445,52,4.963125,52,0.003778,0,0.021164,1,4.16293,96,4.479593,63,4.388536,57,-0.020327,6,0.054194,39,4.183603,35,…,40,2.423291,49,-0.180943,-9,-0.202196,-7,3.208747,118,3.271194,117,3.505167,115,0.071525,2,0.092379,3,3.615561,89,3.844494,87,3.977414,83,0.034574,4,0.100082,6,4.452388,22,4.68249,18,4.39532,24,-0.061328,-6,-0.012817,-2


The following scatterplot showing various countries by their total TTDI and Sustainability values.

In [None]:
from pyobsplot import Plot

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                },
            )
        ],
    }
)

While this plot shows some correlation between the sustainability value and the TTD index value, we could add further information by adding color to each point mark. To do so, we can assign an attribute to the `fill` channel, making sure to add a legend so we can interpret each color.

In [15]:
from pyobsplot import Plot

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "Region",
                },
            )
        ],
        "color": {"legend": True},
    }
)

#### Exercise

Try using other attributes like `"Infrastructure and Services dimension: 2024 Value"` or `"Income Group"` for the fill and see how these visualizations differ.

##### Solution

In [57]:
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "Infrastructure and Services dimension: 2024 Value",
                },
            )
        ],
        "color": {"legend": True},
    }
)

In [58]:
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "Income Group",
                },
            )
        ],
        "color": {"legend": True},
    }
)

#### Controlling the Colormap

Hopefully, you noticed that colormap for infrastructure and services attribute was different from the other two. Plot looks at the **type** of data and tries to infer the correct colormap. It doesn't always get this correct, however, as the income groups are **ordered**. In those cases, we can instruct Plot to treat the attribute in a particular way by specifying a specific color scheme (`blues`) and specifying the domain in the desired order (low- to high- income in our case).

In [69]:
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "Income Group",
                },
            )
        ],
        "color": {"scheme": "blues", "domain": ["Low-Income Economies","Lower-Middle Income Economies","Upper-Middle-Income Economies","High-Income Economies"], "legend": True},
    }
)

## Attribute Type and Color

This correspondence between the type of the attribute and the colormap is important. If the attribute is **categorical** (a grouping), we need to be able to distinguish between **different** values. If the attribute is **ordered**, we would still like to associate each value with a color, but also be able to determine the relationships between marks with different colors. For example, this mark has a color (and thus value) that is very close to this other mark. Or this mark's color is between these two other marks, meaning its value should also be between those two mark's values. When the mark is **quantitative**, we can also discuss the relative magnitude of the differences: this mark is twice as bright as this other mark, meaning the encoded attribute values should be similarly related. Conversely, we **don't** want to allow someone to look at two marks that use color for a categorical attribute and infer these betweenness or magnitude differences.

### Categorical Colormaps

A categorical (or nominative) colormap is one that allows us to differentiate groups. Different colors are often nameably different (red, blue, brown) without splitting hairs (blue-green vs green-blue). Using different hues is a common approach for this, but note that only changing hue limits the scope of colors. Brown, for example, is not a pure hue (full saturation and lightness). Let's see what happens when using a categorical colormap to encode the subregion.

In [None]:
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "Sub Region",
                },
            )
        ],
        "color": {"legend": True},
    }
)

Generally, categorical colormaps have ten to twelve colors. As you can see, there are 15 subregions which exceeds the size of Observable's default colormap (10). Ths means that the same color is being used to encode different subregions and we cannot differentiate between some pairs of subregions inclduing Eurasia and Western Africa. We could try the "set3" scheme which has 12 colors, but that still is too few.

In [17]:
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "Sub Region",
                },
            )
        ],
        "color": {"scheme": "set3", "legend": True},
    }
)

Already here, you should notice that there are two yellow colors, two purple colors. While Observable does not include categorical colormaps with more than 12 colors, we can pull in another colormap (tab20) with 20 colors from matplotlib.

In [19]:
# only included for completeness
# no need to execute this cell

import matplotlib as mpl

tab20 = [
    f"rgb({', '.join(str(int(255 * c)) for c in color)})"
    for color in mpl.color_sequences["tab20"]
]

['rgb(31, 119, 180)',
 'rgb(174, 199, 232)',
 'rgb(255, 127, 14)',
 'rgb(255, 187, 120)',
 'rgb(44, 160, 44)',
 'rgb(152, 223, 138)',
 'rgb(214, 39, 40)',
 'rgb(255, 152, 150)',
 'rgb(148, 103, 189)',
 'rgb(197, 176, 213)',
 'rgb(140, 86, 75)',
 'rgb(196, 156, 148)',
 'rgb(227, 119, 194)',
 'rgb(247, 182, 210)',
 'rgb(127, 127, 127)',
 'rgb(199, 199, 199)',
 'rgb(188, 189, 34)',
 'rgb(219, 219, 141)',
 'rgb(23, 190, 207)',
 'rgb(158, 218, 229)']

In [20]:

tab20 = ['rgb(31, 119, 180)',
 'rgb(174, 199, 232)',
 'rgb(255, 127, 14)',
 'rgb(255, 187, 120)',
 'rgb(44, 160, 44)',
 'rgb(152, 223, 138)',
 'rgb(214, 39, 40)',
 'rgb(255, 152, 150)',
 'rgb(148, 103, 189)',
 'rgb(197, 176, 213)',
 'rgb(140, 86, 75)',
 'rgb(196, 156, 148)',
 'rgb(227, 119, 194)',
 'rgb(247, 182, 210)',
 'rgb(127, 127, 127)',
 'rgb(199, 199, 199)',
 'rgb(188, 189, 34)',
 'rgb(219, 219, 141)',
 'rgb(23, 190, 207)',
 'rgb(158, 218, 229)']

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "Sub Region",
                },
            )
        ],
        "color": {"range": tab20, "legend": True},
    }
)

Looking at the legend, you can see that this colormap provides a different color for each subregion, but we are still left with pairs of similar colors (now conveniently placed next to each other). If you ignore the legend for a minute, try to find two points in different quadrants of the plot that have the same color. You will likely find that this is pretty difficult to tell if the points are the same color or different colors. This problem of identifying different colors has led to the guideline that visualizations should generally not exceed 10-12 different categorical colors.

### Too Many Categories

Of course, the problem is that we cannot control the domain of attribute values. Some attributes may have twenty or thirty different values. So what should we do? There are two general methods: 

1. Group categories together, and/or
2. Create an other category. 

If we forget about the fact that we already have a Region attribute in our dataset, we could look at the subregions and decide which subregions should be grouped. For example, we can group the Americas together, but we might also choose to group North Africa with the Middle East. Then, we will colormap the new group attribute.

In [49]:
groups = [["Western Africa", "Southern Africa", "Eastern Africa"],
["Western Europe", "Eurasia", "Southern Europe", "Northern Europe", "Balkans and Eastern Europe"],
["Middle East", "North Africa"],
["North and Central America", "South America"],
["South Asia", "Eastern Asia-Pacific", "South-East Asia"],
]
groupedSubRegions = {d: chr(65+i) for i, group in enumerate(groups) for d in group}
gdf = df.with_columns(df["Sub Region"].replace_strict(groupedSubRegions).alias("Grouped Sub Region"))

Plot.plot(
    {
        "marks": [
            Plot.dot(
                gdf,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "Grouped Sub Region",
                },
            )
        ],
        "color": {"legend": True},
    }
)

We could also choose to group subregions in an "Other" category. Instead of creating many groups, we assign a number of the values to the one new group. Usually, we do this with values that are rarer (the end of the following table).

In [50]:
with pl.Config(tbl_rows=15):
    display(df["Sub Region"].value_counts(sort=True))

Sub Region,count
str,u32
"""North and Central America""",13
"""Balkans and Eastern Europe""",12
"""Middle East""",10
"""South America""",10
"""Western Europe""",10
"""South-East Asia""",8
"""Western Africa""",8
"""Southern Europe""",8
"""Northern Europe""",7
"""Eurasia""",7


In [67]:
other = {d: "Other" for d in ["Eurasia",
"Southern Africa",
"Eastern Asia-Pacific",
"South Asia",
"Eastern Africa",
"North Africa"]}

odf = df.with_columns(df["Sub Region"].replace(other))

# move Other to the end
domain = odf['Sub Region'].unique().to_list()
domain.sort()
domain.remove('Other')
domain += ["Other"]

Plot.plot(
    {
        "marks": [
            Plot.dot(
                odf,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "Sub Region",
                },
            )
        ],
        "color": {"domain": domain, "legend": True},
    }
)

Note that we chose to move the Other category to be at the end of our domain which is the gray color. For our data, it doesn't work so well because to get down to ten categories, we need to create an Other category that has 33 countries! In other situations, Other may be a good idea, but for subregions, choosing groups makes the most sense.

### Ordered Colormaps

Recall that an ordered attribute has different values that can be ordered but may not be quantitative. For example, T-shirts may come in S, M, L, and XL, and while there is an order to those sizes, the names themselves do not tell us how much bigger one size is than the other. In the example above, the  income levels of the countries in the dataset is an ordered attribute. In order to encode this via color, we need both the order of the attributes as well as a colormap that helps convey this order.

In [65]:
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "Income Group",
                },
            )
        ],
        "color": {"scheme": "blues", "legend": True},
    }
)

You may notice that the above visualization encodes these income groups **alphabetically** not in their semantic order. We need to set the **domain** of the colormap in order to get the correct ordering.

In [71]:
income_group_domain = [
    "Low-Income Economies",
    "Lower-Middle-Income Economies",
    "Upper-Middle-Income Economies",
    "High-Income Economies",
]

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "Income Group",
                },
            )
        ],
        "color": {"scheme": "blues", "domain": income_group_domain, "legend": True},
    }
)

That legend looks better but uses the swatches like the categorical colormap. While this allows a more readable layout, we can use a ramp legend instead to provide a clearer comparison of the different values.

##### Exercise

Update the plot to show a ramp legend. See the [legend documentation](https://observablehq.com/plot/features/legends) for information about how to do this. Due to the length of the category names, we will need to either increase the width of the legend or change the labels shown in the legend.

##### Solution

In [73]:
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "Income Group",
                },
            )
        ],
        "color": {
            "scheme": "blues",
            "domain": income_group_domain,
            "legend": "ramp",
            "width": 800,
        },
    }
)

Alternatively, we could edit the labels on the legend to fit better. In this case, rather than restructure the entire Income Group column, we can set the `tickFormat` to truncate the "-Income Economies" part of the values.

*Note: When using Plot in javascript, this would be a normal function, but calling it from python requires us to use the `js` method to execute this code when creating the visualizaiton.*

In [25]:
from pyobsplot import js

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "Income Group",
                },
            )
        ],
        "color": {
            "scheme": "blues",
            "domain": income_group_domain,
            "legend": "ramp",
            "width": 400,
            "tickFormat": js('d => d.slice(0, -"-Income Economies".length)'),
        },
    }
)

### Quantitative Colormaps

With a quantitative value like "Socioeconomic Impact", we often expect that the domain of possible values is **continuous**--any value in an interval is possible (4.9, 5.0, and 4.91). In such cases, we might expect that 4.91 might have a slightly different color than 4.9. In contrast to the ordered case where we had a discrete domain that the attribute belonged to, here, we can have many more possible values, and our colormap defines a **function** from the domain to the color range instead of a mapping.

While Plot uses `turbo` as its default quantiative colormap, we will use a single-hue colormap (`greens`) instead. While not strictly necessary as the scheme implies this, we will also specify that this is a **sequential** colormap.

In [None]:
from pyobsplot import js

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Value",
                },
            )
        ],
        "color": {"scheme": "greens", "type": "sequential", "legend": True},
    }
)

We might wish to compare this with the 2019 values.

In [None]:
from pyobsplot import js

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2019 Value",
                },
            )
        ],
        "color": {"scheme": "greens", "type": "sequential", "legend": True},
    }
)


Note that Plot automatically restricts the domain to the interval of values in the dataset, and this is **different** for 2019 than it was for 2024. This means that the same color in the first and second plots indicates a (slightly) different value. We can fix the domain to make the two plots comparable.

In [None]:
from pyobsplot import js

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Value",
                },
            )
        ],
        "color": {
            "scheme": "greens",
            "type": "sequential", 
            "domain": [2, 6],
            "legend": True,
        },
    }
)


In [None]:
from pyobsplot import js

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2019 Value",
                },
            )
        ],
        "color": {
            "scheme": "greens",
            "type": "sequential", 
            "domain": [2, 6],
            "legend": True,
        },
    }
)


While having a side-by-side comparison in order to notice differences may permit some conclusions, this is generally tedious and error-prone. How do I know that I am matching the points correctly? Instead, we can compute the difference in scores to show countries where this value has increased or decreased.

In [None]:
df_diff = df.with_columns(
    (
        pl.col("T&T Socioeconomic Impact pillar: 2024 Value")
        - pl.col("T&T Socioeconomic Impact pillar: 2019 Value")
    ).alias("T&T Socioeconomic Impact pillar: 2019-2024 Diff")
)
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df_diff,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2019-2024 Diff",
                },
            )
        ],
        "color": {"scheme": "greens", "type": "sequential", "legend": True},
    }
)


That helps, and points out countries where socioeconomic impact has improved, but note that it is not easy to quickly differentiate between countries that have increased or declined. Here, a **diverging** colormap would be useful as it specifies different colors for each side of the colormap. A diverging colormap has a meaningful midpoint where it is useful to be able to determine those values above and below the midpoint.

In [91]:
df_diff = df.with_columns(
    (
        pl.col("T&T Socioeconomic Impact pillar: 2024 Value")
        - pl.col("T&T Socioeconomic Impact pillar: 2019 Value")
    ).alias("T&T Socioeconomic Impact pillar: 2019-2024 Diff")
)

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df_diff,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2019-2024 Diff",
                },
            )
        ],
        "color": {"scheme": "puor", "type": "sequential", "legend": True},
    }
)


That's not quite correct, though, because our range of possible values is not balanced. Zero is not the middle value! There are two ways to fix this:

1. Correct the domain so that the interval below zero is the same size as the interval above zero, or
2. inform Plot that this is a diverging attribute.

In fact, by using a colormap that is diverging, Plot will automatically set the type to diverging (similar for sequential, we included it earlier to showcase this difference).

In [92]:
df_diff = df.with_columns(
    (
        pl.col("T&T Socioeconomic Impact pillar: 2024 Value")
        - pl.col("T&T Socioeconomic Impact pillar: 2019 Value")
    ).alias("T&T Socioeconomic Impact pillar: 2019-2024 Diff")
)

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df_diff,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2019-2024 Diff",
                },
            )
        ],
        "color": {"scheme": "puor", "type": "diverging", "legend": True},
    }
)


Note that if we want to maximize the range of colors on each side, we can set the `symmetric` property to false which scales the values on each side of the midpoint **differently**. This can be confusing, however, because the amount of change in the color between two points is different depending on which side of the midpoint you are.

In [93]:
df_diff = df.with_columns(
    (
        pl.col("T&T Socioeconomic Impact pillar: 2024 Value")
        - pl.col("T&T Socioeconomic Impact pillar: 2019 Value")
    ).alias("T&T Socioeconomic Impact pillar: 2019-2024 Diff")
)

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df_diff,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2019-2024 Diff",
                },
            )
        ],
        "color": {
            "scheme": "puor",
            "type": "diverging",
            "symmetric": False,
            "legend": True,
        },
    }
)

For some measures, the midpoint is not actually zero. The dataset also includes rankings for the countries in each pillar. Suppose we wish to know which countries are above the middle rank and which are below.

In [94]:
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Rank",
                },
            )
        ],
        "color": {"scheme": "puor", "type": "diverging", "legend": True},
    }
)


The standard scheme does not work because the midpoint (or what Plot calls the `pivot`) defaults to zero. If we want it to be the the middle rank (in this case, 60), we need to set the pivot explicitly. Another example of a non-zero pivot is temperature specified in Fahrenheit where 32-–the freezing point of water-–is often used as a pivot.

In [95]:
# find the middle rank
df["T&T Socioeconomic Impact pillar: 2024 Rank"].describe()

statistic,value
str,f64
"""count""",119.0
"""null_count""",0.0
"""mean""",60.0
"""std""",34.496377
"""min""",1.0
"""25%""",31.0
"""50%""",60.0
"""75%""",90.0
"""max""",119.0


In [36]:
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Rank",
                },
            )
        ],
        "color": {"scheme": "puor", "type": "diverging", "pivot": 60, "legend": True},
    }
)


### Segmented Colormaps

Until now, we have seen that categorical and ordered colormaps need discrete scales given their finite number of values while quantiative colormaps can use continuous colormaps to map each value to slightly different colors. However, we could choose to bin the quantiative values into ranges and use a **segmented** colormap instead. This approach is often seen in weather maps; the temperature is in the 60s or in the 70s, and there is a dividing line between the two on the map, even though near that line we expect temperatures to very close to each other (69.5 vs. 70.5, for example). We can modify our earlier quantiative colormap to be segmented using the `quantize` type of scale.

In [96]:
from pyobsplot import js

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Value",
                },
            )
        ],
        "color": {
            "scheme": "greens",
            "type": "quantize",
            "legend": True,
        },
    }
)

We can change the number of segments by setting `n`, but Plot will only guarantee "approximately" that number of segments. It looks to enforce meaningful boundaries. Using the default `n` broke this on whole-number boundaries above. Using `n == 7` creates intervals of length 0.5.

In [97]:
from pyobsplot import js

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Value",
                },
            )
        ],
        "color": {
            "scheme": "greens",
            "type": "quantize",
            "n": 7,
            "legend": True,
        },
    }
)

We get exactly the number of bins we wish by creating them and passing them directly using the `threshold` scale.

In [98]:
c = pl.col("T&T Socioeconomic Impact pillar: 2024 Value")
[minval, maxval] = df.select(c.min()).item(), df.select(c.max()).item()
nbins = 7
bins = [minval + i * (maxval - minval) / nbins for i in range(1, nbins)]

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Value",
                },
            )
        ],
        "color": {
            "scheme": "greens",
            "type": "threshold",
            "domain": bins,
            "legend": True,
        },
    }
)

This isn't great because the quantiles are floating-point numbers, and the legend is pretty difficult to read. We can clean this up by improving the `tickFormat` to round floating point numbers.

In [99]:
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Value",
                },
            )
        ],
        "color": {
            "scheme": "greens",
            "type": "threshold",
            "domain": bins,
            "legend": True,
            "tickFormat": ".2f",
        },
    }
)

If our goal is to distribute the values across colors so that the same number of points are assigned to each color, we can use a quantile scale instead.

In [100]:
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Value",
                },
            )
        ],
        "color": {
            "scheme": "greens",
            "type": "quantile",
            "n": 7,
            "legend": True,
            "tickFormat": ".2f",
        },
    }
)

Here, we can still see ordering between the items, but we cannot make good inferences about the magnitude of the changes between them.

##### Exercise

Create a segmented version of the diverging colormap of rank where each segment includes 10 ranks.

##### Solution

In [None]:
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df.with_columns(pl.col("T&T Socioeconomic Impact pillar: 2024 Rank").cast(pl.Int32)),
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Rank",
                },
            )
        ],
        "color": {"scheme": "puor", "type": "quantize", "n": 12, "width": 300, "legend": True},
    }
)

### Continuous vs. Segmented Colormaps

You might wonder why we might choose to use segmented colormaps over continuous colormaps when the value is quantitative and continuous. Intuitively, we should expect that a viewer can make more precise judgments of values using the continuous colormap. However, you may also recall that with categorical colormaps, the number of distinct colors we can accurately resolve is quite limited. One benefit with a segmented colormap is the distinct boundary. For elevation maps, this boundary can be useful in clearly defining regions of similar elevation rather than leaving it to viewers to interpret gradients correctly. One result showed that a segmented colormap with few bins led to decreased errors in analysis tasks on an elevation map ([Padilla et al., 2017](http://space.ucmerced.edu/Downloads/publications/Padilla2016InfoVis.pdf)).


## Colormap Definition

For those used to creating their own styles, it is tempting to define colormaps as well. Remember, however, that human color vision is not as straightforward as we might like. In addition, there are many [aesthetic factors](https://www.datawrapper.de/blog/colors?utm_source=dataquest&utm_medium=crosspost) to take into consideration. Plot will dutifully interpolate between any colors you give it as the range, but this can lead to poor results.

In [101]:
from pyobsplot import js

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Value",
                },
            )
        ],
        "color": {
            "range": ["yellow", "purple", "orange"],
            "type": "linear",
            "legend": True,
        },
    }
)

Here, we have oranges showing up on both sides of the colormap. Even when we pick colors that show more differentiation, the interpolation between them leads to poor results.

In [102]:
from pyobsplot import js

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Value",
                },
            )
        ],
        "color": {
            "range": ["red", "green", "blue"],
            "type": "linear",
            "legend": True,
        },
    }
)

If you must develop a distinct colormap (e.g. for branding purposes for a media organization), consider using [tools](https://sciviscolor.org/tools/) that help with this process. However, in general, it is best to use the [already defined colormaps](https://observablehq.com/@observablehq/plot-cheatsheets-colors) available in most visualization tools.

## Rainbow Colormaps

Finally, we would be remiss to not discuss a topic that has generated considerable debate over the years: the rainbow colormap. For a quantitative value, a single-hue colormap like the green one we began with allows us to use hue in the visualization for other attributes.

In [122]:
from pyobsplot import js

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Value",
                },
            )
        ],
        "color": {"scheme": "greens", "type": "sequential", "legend": True},
    }
)

We might be able to include a categorical attribute using hue, for example, using blue, green, and orange, and then encode the quantitative attribute by varying their lightness. However, if we are not using hue for another attribute, we could also employ a **multi-hue** colormap that communicates the different values.

In [121]:
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Value",
                },
            )
        ],
        "color": {"scheme": "magma", "type": "sequential", "legend": True},
    }
)

This lets us better dinstinguish particular ranges and even discuss them since there are distinct regions of colors. Given those benefits, we would expect a rainbow colormap to maximize the different ranges of color and allow greater differentiation between values. For many years, the default colormap in Matlab was the jet colormap which exemplified the rainbow colormap.

In [123]:
jet_scale = [
    [0, 0, 0.5],
    [0, 0, 1],
    [0, 0.5, 1],
    [0, 1, 1],
    [0.5, 1, 0.5],
    [1, 1, 0],
    [1, 0.5, 0],
    [1, 0, 0],
    [0.5, 0, 0],
]
jet_rgb = [f"rgb({', '.join(str(int(v * 255)) for v in c)})" for c in jet_scale]

Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Value",
                },
            )
        ],
        "color": {
            "domain": [2, 6],
            "range": jet_rgb,
            "interpolate": "rgb",
            "type": "sequential",
            "legend": True,
        },
    }
)

You might notice that this does add a lot of distinct color to the visualization, but there are also differences in which colors jump off the screen. First, recall that for those who are red-green colorblind, they will not be able to see the differences between the red and green points. In addition, the ordering of these colors is not necessarily intuitive. If you ignore the legend and just look at the points of the scatterplot, can you put the values in order? In addition, even for those without colorblindness, the scale seems to change more in the yellow range than in the green, red, or blue ranges. Finally, the yellow and cyan colors have higher luminance than rest of the colors, which can lead to the impression of banding when those bands do not actually exist. These deficiencies have led visualization researchers to urge the public to stay away from rainbow colormaps ([Moreland, 2016](https://www.kennethmoreland.com/color-advice/BadColorMaps.pdf), [Crameri et al., 2020](https://www.nature.com/articles/s41467-020-19160-7)), and have led to changes in the default colormaps in popular software packages like [Matlab](https://blogs.mathworks.com/steve/2014/10/13/a-new-colormap-for-matlab-part-1-introduction/) and [Matplotlib](https://bids.github.io/colormap/).

However, some communities, including meterology, continue to use rainbow-like colormaps due to historical precedent, and some theorize that familiarity and nameable colors may explain this ([Ware et al., 2023](https://www.computer.org/csdl/magazine/cg/2023/03/10128890/1NdJMHqISnS)). Others have investigated whether the rainbow colormap can be improved instead of discarded. The isoluminant rainbow colormap ([Kindlmann et al., 2002](https://people.cs.uchicago.edu/~glk/pubs/pdf/Kindlmann-FaceBasedLuminanceMatching-VIS-2002.pdf)) sought remove the luminance changes in the rainbow, while the turbo colormap ([Mikhailov, 2019](https://research.google/blog/turbo-an-improved-rainbow-colormap-for-visualization/)) sought to smooth out the luminance artifacts. Plot includes `turbo` as an colormap option.

In [124]:
Plot.plot(
    {
        "marks": [
            Plot.dot(
                df,
                {
                    "x": "Travel & Tourism Development Index: 2024 Value",
                    "y": "Environmental Sustainability pillar: 2024 Value",
                    "fill": "T&T Socioeconomic Impact pillar: 2024 Value",
                },
            )
        ],
        "color": {
            "domain": [2, 6],
            "scheme": "turbo",
            "type": "sequential",
            "legend": True,
        },
    }
)

While I'm not sure that I would recommend the turbo colormap for this plot, it does improve on some of the problems identified.

## Exercise

Create visualizations of the [Seattle weather dataset](https://github.com/vega/vega/blob/main/docs/data/seattle-weather.csv) that utilize colormaps.

1. Create a plot where each day is colored based on the temperature (`temp`). Given the starting plot below, set the colormap to match the [Climate Stripes](https://www.reading.ac.uk/planet/climate-resources/climate-stripes) visualization.
2. Create a new version of this visualization to instead show the `weather` type for each day using a different colormap. Think about colors that might best reflect this weather type like [this example](https://altair-viz.github.io/gallery/seattle_weather_interactive.html).
3. Create a third version with amount of precipitation. Consider the type of scale that might work best here to better highlight days with some preciptation.
4. Try updating the first visualization so that the temperature is dual encoded by both height and color.

In [85]:
import polars as pl

wdf = pl.read_csv('https://raw.githubusercontent.com/vega/vega-datasets/refs/heads/main/data/seattle-weather.csv', try_parse_dates=True).with_columns(((pl.col('temp_max') + pl.col('temp_min'))/2).alias('temp')).filter(pl.col("date").dt.year() == 2012)

date,precipitation,temp_max,temp_min,wind,weather,temp
date,f64,f64,f64,f64,str,f64
2012-01-01,0.0,12.8,5.0,4.7,"""drizzle""",8.9
2012-01-02,10.9,10.6,2.8,4.5,"""rain""",6.7
2012-01-03,0.8,11.7,7.2,2.3,"""rain""",9.45
2012-01-04,20.3,12.2,5.6,4.7,"""rain""",8.9
2012-01-05,1.3,8.9,2.8,6.1,"""rain""",5.85
…,…,…,…,…,…,…
2012-12-27,4.1,7.8,3.3,3.2,"""rain""",5.55
2012-12-28,0.0,8.3,3.9,1.7,"""rain""",6.1
2012-12-29,1.5,5.0,3.3,1.7,"""rain""",4.15
2012-12-30,0.0,4.4,0.0,1.8,"""drizzle""",2.2


In [92]:
from pyobsplot import Plot, js

Plot.plot({
    "marks": [
        Plot.rectY(wdf, Plot.binX({ "fill": "mean" }, { "x": "date", "fill": "temp", "thresholds": js("d3.utcDay"), "inset": 0}))
    ],
    "width": 800,
    "height": 60,
    "x": {"scale": "band", "ticks": "month"},
    "y": {"axis": None},
    "color": {"range": ["black","black"]}
})

ObsplotWidget(spec={'data': [{'pyobsplot-type': 'DataFrame', 'value': b'ARROW1\x00\x00\xff\xff\xff\xffH\x06\x0…

## Possible Solution

### Part 1

In [87]:
from pyobsplot import Plot, js

Plot.plot({
    "marks": [
        Plot.rectY(wdf, Plot.binX({ "fill": "mean" }, { "x": "date", "fill": "temp", "thresholds": js("d3.utcDay"), "inset": 0}))
    ],
    "width": 800,
    "height": 60,
    "x": {"scale": "band", "ticks": "month"},
    "y": {"axis": None},
    "color": {"scheme": "burd", "legend": True, "symmetric": False},
})

ObsplotWidget(spec={'data': [{'pyobsplot-type': 'DataFrame', 'value': b'ARROW1\x00\x00\xff\xff\xff\xffH\x06\x0…

### Part 2

In [None]:
from pyobsplot import Plot, js

Plot.plot({
    "marks": [
        Plot.rectY(wdf, Plot.binX({ "fill": "mode" }, { "x": "date", "fill": "weather", "thresholds": js("d3.utcDay"), "inset": 0}))
    ],
    "width": 800,
    "height": 60,
    "x": {"scale": "band", "ticks": "month"},
    "y": {"axis": None},
    "color": {"legend": True},
})

ObsplotWidget(spec={'data': [{'pyobsplot-type': 'DataFrame', 'value': b'ARROW1\x00\x00\xff\xff\xff\xffH\x06\x0…

In [89]:
from pyobsplot import Plot, js

Plot.plot({
    "marks": [
        Plot.rectY(wdf, Plot.binX({ "fill": "mode" }, { "x": "date", "fill": "weather", "thresholds": js("d3.utcDay"), "inset": 0}))
    ],
    "width": 800,
    "height": 60,
    "x": {"scale": "band", "ticks": "month"},
    "y": {"axis": None},
    "color": { "domain": ['sun', 'fog', 'drizzle', 'rain', 'snow'],
    "range": ['#e7ba52', '#a7a7a7', '#aec7e8', '#1f77b4', '#9467bd'],
"legend": True},
})

ObsplotWidget(spec={'data': [{'pyobsplot-type': 'DataFrame', 'value': b'ARROW1\x00\x00\xff\xff\xff\xffH\x06\x0…

### Part 3

In [91]:
from pyobsplot import Plot, js

Plot.plot({
    "marks": [
        Plot.rectY(wdf, Plot.binX({ "fill": "mean" }, { "x": "date", "fill": "precipitation", "thresholds": js("d3.utcDay"), "inset": 0}))
    ],
    "width": 800,
    "height": 60,
    "x": {"scale": "band", "ticks": "month"},
    "y": {"axis": None},
    "color": {"scheme": "blues", "legend": True, "type": "log"},
})

ObsplotWidget(spec={'data': [{'pyobsplot-type': 'DataFrame', 'value': b'ARROW1\x00\x00\xff\xff\xff\xffH\x06\x0…

### Part 4

In [93]:
from pyobsplot import Plot, js

Plot.plot({
    "marks": [
        Plot.rectY(wdf, Plot.binX({ "fill": "mean", "y": "mean" }, { "x": "date", "y": "temp", "fill": "temp", "thresholds": js("d3.utcDay"), "inset": 0}))
    ],
    "width": 800,
    "height": 300,
    "x": {"scale": "band", "ticks": "month"},
    "y": {"axis": None},
    "color": {"scheme": "burd", "legend": True, "symmetric": False},
})

ObsplotWidget(spec={'data': [{'pyobsplot-type': 'DataFrame', 'value': b'ARROW1\x00\x00\xff\xff\xff\xffH\x06\x0…