Coffee and High Cardinality

statistics
r
dataviz
Author

Mark Jurries II

Published

November 24, 2025

My phone thought I’d be interested in this article on coffee consumption by country. It wound up being a good call, largely because I do my part to bring global coffee consumption numbers upward but also because the first chart caught my eye:

Now, to be clear, this isn’t terrible. You can see where it’s the most expensive, and it’s not particularly hard to find a country. That doesn’t mean there aren’t critiques:

*I want to be careful here - some data viz practioneers can come off as rather anti-fun. I’m not against flair in a chart per se, and perhaps I’m just more used to building charts in a business context. Still, let’s make it easy to focus on the data, eh?

The article itself does a good job going through the data*, and there’s a table that’s a lot more useful than the chart. It’s paginated, so it’s not overwhelming to look at. I want to stress that it’s really hard to display data when your dimensions have so many members - anything over 10 and the brain begins to break.

*They got their data from this article, which uses a similar technique as mine to show data, something I didn’t see until I’d made my chart. Always nice to get confirmation that the approach is valid, though.

What we can do is panel the data, that is, show ~20 per column and move to the next. This makes it hard to compare individual countries, at least if they’re in separate panels, but we get to keep things in one view. Thomas Sowell’s quip that “There are no solutions, there are only trade-offs; and you try to get the best trade-off you can get, that’s all you can hope for” applies to data viz as much as it does to economics.

Show the code
library(hrbrthemes)
library(janitor)
library(gt)
library(gtExtras)
library(tidyverse)

coffee <- readxl::read_xlsx('coffee_consumption.xlsx') %>%
  clean_names()

coffee %>%
  mutate(cup_rank = rank(desc(daily_coffee_consumption_per_capita_cups), ties.method = "first")) %>%
  mutate(country_label = paste0(round(cup_rank, 0), ". ", toupper(country))) %>%
  mutate(rank_facet = case_when(cup_rank <= 22 ~ '1',
                                cup_rank <= 44 ~ '2',
                                TRUE ~ '3')) %>%
  ggplot(aes(x = daily_coffee_consumption_per_capita_cups, y = reorder(country_label, -cup_rank)))+
  geom_bar(stat = 'identity', fill = '#967259')+
  geom_text(aes(label = daily_coffee_consumption_per_capita_cups, hjust = -0.2))+
  theme_ipsum()+
  theme(text=element_text(size = 16,  family="Oswald"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        strip.text.x = element_blank(),
        plot.title.position = "plot")+
  xlim(0, 6.5)+
  xlab('')+
  ylab('')+
  ggtitle(label = 'AVERAGE CUPS OF COFFEE BY COUNTRY')+
  labs(caption = 'Data via https://www.visualcapitalist.com/ranked-which-country-consumes-the-most-coffee/')+
  facet_wrap(rank_facet ~ ., nrow = 1, scales = 'free_y')

This is cleaner, we’ve removed a lot of non-data ink (backgrounds, pictures, gridlines, etc.) so the focus is on the data. Again, nothing against an infographic approach, but in most contexts you want something simple. Especially in a business context, you don’t want to have to spend time explaining things that aren’t related to the data story you’re trying to tell.

You shouldn’t do this for more than one metric. If you need multiple metrics, you’re better off either putting all the dimensions in the same column and letting the user scroll or creating multiple charts. Another option is to show the top 10 (or 15, or 20, or whatever) members of your dimension and grouping the rest into an “other” bucket. You’ll just need to be clear on what defines top n, especially if you do this in an interactive format like Tableau and you’re letting users decide what metric is used for this.

That leads into my final point - you chart choice will depend on how it’s being presented. Interactive provides a lot more options, since it’s generally fine for your users to scroll. If it’s a static chart going into a deck somewhere, you’ll need to be creative. Knowing your options and watching what others do is your best bet, the more tools you have on your belt the more likely you are to have something appropriate to your data and audience.