Skip to content

Examining Slavic Language Speaker Statistics in the U.S. with R

Hey everyone!

For the Data Science unit in my Programming Language and Design class at school, I used R (a statistical programming language) to visualize data on people in the U.S. who speak a Slavic language at home along with their respective English proficiencies.

I love data science and creating cool graphs and visualizations through programming. My research (more on that coming soon!) is done almost entirely in R, so I’m very familiar with the language and various IDEs that people use (RStudio, Vim, Emacs + ESS, etc.). I’m also an avid reader of the daily email newsletter from R-bloggers, a site that offers great R tutorials and discussion as well as a strong community of R users.

Of course, I can’t bear to leave out Twitter, which has introduced me to many awesome women in data science. Rachael Tatman, a Data Scientist at Kaggle with a PhD in linguistics from the University of Washington, is one of my inspirations. She works mainly with R and Python (also used for data science, but more powerful in terms of algorithms/machine learning) and does really cool research in computational sociolinguistics, specifically looking at emoji and how different dialects are processed by computational systems.

For this project, I used RMarkdown to create a report and add descriptions and analysis. It is attached below. My favorite graphs are the map and the pie charts on the second page. R has some really nice color palettes (check out RColorBrewer!) to make graphs look amazing, and those were pretty cool to play around with. It was overall an immensely fun project.

Hope you enjoy reading!

Examining Slavic Language Speaker Statistics in the United States