Tidy Tuesday - African American History

Jun 26, 2020 4 min read R

If you’re not familiar with Tidy Tuesday, it is a weekly project hosted online by the R for data science community. Every Tuesday a new dataset is released and people are encouraged to explore, analyse, and visualize it in interesting ways. This is my first week exploring tidy tuesday data. Information about the project and datasets is at the tidytuesday github. Before working with this data I watched Julia Silge’s excellent screencast and picked up some great ways to find missing values and recode data.

A Little History

I learned a lot just by looking at the data provided. I was not previously aware of the history captured in the african_names dataset - which lists the names of enslaved people that were freed as they were being illegally smuggled to the Americas. The most names were recorded at the port of Freetown in Sierra Leone before making the trans-atlantic journey. Here’s the description of the dataset excepted on the tidytuesday github page:

During the last 60 years of the trans-Atlantic slave trade, courts around the Atlantic basins condemned over two thousand vessels for engaging in the traffic and recorded the details of captives found on board including their African names. The African Names Database was created from these records, now located in the Registers of Liberated Africans at the Sierra Leone National Archives, Freetown, as well as Series FO84, FO313, CO247 and CO267 held at the British National Archives in London. Links are provided to the ships in the Voyages Database from which the liberated Africans were rescued, as well as to the African Origins site where users can hear the names pronounced and help us identify the languages in which they think the names are used.

Table 1: Data summary
Name	african_names
Number of rows	91490
Number of columns	11
_______________________
Column type frequency:
character	6
numeric	5
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
name	0	1.00	2	24	62330
gender	12878	0.86	3	5	4
ship_name	1	1.00	2	59	443
port_disembark	0	1.00	6	19	5
port_embark	1126	0.99	4	31	59
country_origin	79404	0.13	3	31	563

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
id	0	1.00	62122.02	51305.07	1.0	22935.25	45822.5	101263.8	199932	▇▆▃▁▂
voyage_id	0	1.00	17698.25	82016.88	557.0	2443.00	2871.0	3601.0	500082	▇▁▁▁▁
age	1126	0.99	18.89	8.60	0.5	11.00	20.0	26.0	77	▆▇▁▁▁
height	4820	0.95	58.61	6.84	0.0	54.00	60.0	64.0	85	▁▁▂▇▁
year_arrival	0	1.00	1831.40	9.52	1808.0	1826.00	1832.0	1837.0	1862	▂▆▇▃▁

## # A tibble: 5 x 2
## # Groups:   port_disembark [5]
##   port_disembark          n
##   <chr>               <int>
## 1 Freetown            81009
## 2 Havana              10058
## 3 Bahamas unspecified   183
## 4 Kingston, Jamaica     144
## 5 St. Helena             96

From this output we can see that country_origin has the most missing data by far. There’s a clue about this in the description of the data above, which mentions the African Origins site, where users can hear the names pronounced and help identify the languages in which they think the names are used.

So the people who were liberated were from such different cultures that the original documentarians could not speak the same language or determine where they originally came from. We can see that about 81,000 people were freed in Freetown (This now makes sense - again learning lots here). About 10,000 people were freed in Havana, Cuba, and many less in other locations in the caribbean.

The age variable includes entries from a 6-month old child to a 77 year old person. The gender variable has 12,878 missing values and 4 options. I’ll use some of the same techniques as Julia Silge to clean up this data.

african_names %>%
  group_by(port_disembark, year_arrival) %>%
  count() %>% 
  arrange(desc(year_arrival)) %>%
  ggplot(aes(x = year_arrival, y = n, color = port_disembark)) + geom_line(alpha = 0.6, size = 2) + geom_point(alpha = 0.6)

We can see that the majority of liberations occured in Freetown. I wonder if the ships had stopped going to Freetown by 1849, or if there was less enforcement, or if they stopped being recorded. Similary, I wonder what the policies were in each of the other ports that made them free enslaved Africans for the time periods reflected in this data.

R Markdown plot tidytuesday

Liz McConnell

Graduate Student, CSU Center for Contaminant Hydrology

My research interests include contaminant fate and transport, data analysis using statistics and machine learning, R programming, and geospatial analysis.

Tidy Tuesday - African American History

A Little History

Liz McConnell

Graduate Student, CSU Center for Contaminant Hydrology

Related