The small things

How a tiny soloution for me helped the larger team.

By Millie O. Symns in R

February 21, 2022

Whenever I learn something new in R, I feel like a magician, especially visualizations.

Since I usually work on cross-functional teams, I am often learning about business problems and playing my part to help solve the issue with data. Someone recently asked me:

What project have you worked on that created a lasting impact?

I have written papers and completed research projects for program directors and leaders to make decisions (and I am proud of those). I think the story folks want to hear, but it doesn’t always have to be the big thing that makes all the difference.

Background

Working on a global team with schools in various countries meant dealing with varying school calendars. After a couple of months in, I was trying to wrap my head around academic years.

I wondered if there was looking for information on these school timelines. I asked around if there was any documentation or visuals about the school schedules, but no luck. Then I found the right colleague who kindly pointed to a mini-database I hadn’t seen that had many columns of information about the schools, and there it was - information on the school calendars for all schools across the globe! 🥳

I extracted the CSV and happily opened up RStudio to get in there.

NOTE: I created a sample dataset to show my process.

Process

At first, I thought I would create a clean table for reference, but knowing myself and dealing with a concept of time 🙃 it felt best to create a visual.

I did a quick search on google to find what techniques people have used to create Gantt charts and came across this page: https://cran.r-project.org/web/packages/vistime/vignettes/vistime-vignette.html#ex.-3-gantt-charts

I knew I wanted something quick and function, so this seemed to fit!

Based on this document, I figured out the steps I needed to take:

  1. Create a table of the calendar months to see frequencies and assign colors
  2. Create start and end dates from the months
  3. Format a data frame for the final visual

At first I started with a file that looked something like this:

id geo country school_calendar_months school_year_start_month school_year_end_month
1 APAC Australia February - November February November
3 APAC New Zealand February - December February December
6 EMEA England September - July September July
16 North America United States September - June September June

My sample data is about 30 rows with 4 geos/regions and 15 countries with various start and end months

Then took a look at different academic years and assigned a color to each. I decided to input a gradient of green for this one so that the end result shows a darker color (meaning more schools in this country with that particular academic year schedule) to a lighter color.


school_calendar <- raw_school_info %>% 
  # Getting a count of the calendar month combos
  group_by(school_calendar_months) %>% 
  summarise(sum = n()) %>% 
  arrange(desc(sum)) %>% 
  # hard coding exact colors I would like in the visual based on the sum of calendar month combos
  mutate(color = case_when(sum == 9 ~ "#145a32", 
                           sum == 5 ~ "#1e8449", 
                           sum == 4 ~ "#27ae60",
                           sum == 2 ~ "#52be80", 
                           sum == 1 ~ "#7dcea0", 
                           TRUE ~ ""), 
         fontcolor = "white")

I needed a date column, so I just made a lookup table to join in later to save my brain. It felt the easiest at the time.


l_months <-tribble(
  ~month, ~date, 
  "January", "20220101", 
  "February", "20220201", 
  "March", "20220301", 
  "April", "20220401", 
  "May", "20220501", 
  "June", "20220601", 
  "July", "20220701", 
  "August", "20220801", 
  "September", "20220901", 
  "October", "20221001",
  "November", "20221101",
  "December", "20221201"
) %>% 
  mutate(date = as_date(date))

Then I started doing a series of left join to get the date columns where I needed them to be


#Add date lookup for the start date
school_dates <- left_join(raw_school_info, l_months, by = c("school_year_start_month" = "month")) %>%
  rename(start_date = date)

#Add date lookup for the end date
school_dates <- left_join(school_dates, l_months, by = c("school_year_end_month" = "month")) %>% 
  rename(t_end_date = date)

#Add colors for calendar months
school_dates <- left_join(school_dates, school_calendar, by = "school_calendar_months")

Then for my last bit of formatting, I needed to correct the end dates so the Gantt chart would display correctly.


#Corrected end dates for academic years
school_dates %<>%
        # Check to see which set of months are negative (i.e., start date is after the end date)
  mutate(t_months = interval(start_date , t_end_date) %/% months(1), 
        # Correct the end date where needed
         end_date = as_date(ifelse(t_months < 0, ymd(t_end_date) + years(1), ymd(t_end_date))),
         school_months = interval(start_date , end_date) %/% months(1)) # may need to add +1 since end dates are the 1st of each month

I then made my final data frame by selecting the columns I needed and removing the duplicate rows, since I was still working on the school level. And finally, plot my Gantt chart.

#Removed duplicates to have unique file by geo and country 
school_cal_timeline <- school_dates %>% 
  select(geo, country, start = start_date, end = end_date, school_months, color, fontcolor) %>% 
  distinct(geo, country, start, end, .keep_all = T)

#Plot school calendar
vistime(school_cal_timeline, col.group = "geo", col.event = "country", title = "Global School Calendars", optimize_y = F, linewidth = 20)

Final Visual

You can take a picture of the plot and save it as a png. This is about what it looked like:

Takeaways

Having this visual was super helpful at the time. It helped me see a concentration of schools by country with each particular school calendar. It was also great at reminding me what regions and countries would make sense to group for outreach if I wanted to send out pre and post-surveys.

I created this for myself but then thought this might be helpful for another colleague and shared it with them. Word got about, and other folks started asking for the chart I made from other divisions that also needed to deal with global school calendar schedules.

I didn’t go into creating this chart with that intention, but I am glad it had the impact that it did and helped make work a little easier for other folks. 🤗

Posted on:
February 21, 2022
Length:
5 minute read, 994 words
Categories:
R
See Also: