Chapter 15: Visualization Display - Statistics for LIS with Open Source R

One of the most appealing things about R is its ability to create data visualizations with just a couple of lines of code. This chapter focuses on simple visualization using R.

What Is Visualization?

Visualization is the study of representation. According to Segal and Heer (2013), the primary goal of visualization is based on taking statistical analysis and communicating the result of the data clearly and efficiently. According to J.Z Wang, et. al (2008), professional designers and artists are quite cognizant of the rules that guide the design of effective color palettes, from both aesthetic and attention-guiding points of view. However, in the field of visualization, the use of systematic rules embracing these aspects has received less attention. The problem of choosing colors for data and information visualization is expressed well by Edward Tufte: “avoiding catastrophe becomes the first principle in bringing color to information: Above all, do no harm” (Tufte, 1997).

Colors in R

Colors enhance and clarify any type of presentation. When colors are used poorly, they obscure, muddle, and confuse. While there is a strong aesthetic component to color, using color well in information and data display is essentially about function: what information are you trying to convey, and how (or whether) color can enhance it.

In R, there are three types of colors:

1. Hexadecimal colors (#rrggbb). Based on this category, the colors are based on a place-value notation method of representing colors by encoding numbers. In the example below: col=#4682B4″ represents:

The color # col=#4682B4" — The color # col=#4682B4″

2. Named colors: R can interpret hundreds of named colors, such as “plum” and “seagreen” as hexadecimal colors.

The code in R:
>colors
[1]”white”
[2]”aliceblue”
[3]”antiquewhite”
[4]”antiquewhite1″
[5] “antiquewhite2″
[6]”antiquewhite3″
[7]”antiquewhite4″
[8]”aquamarine”
[9]”aquamarine1″
[657]”yellowgreen”

3. Integers refer to positions in the current color palette. The term palette in visualization often refers to the range of colors used to fill the graph. Microsoft often uses palettes in its applications. The benefit of using a palette is that you do not need to choose each color in the visualization.

How should you apply colors in R?

A pie chart is a circle graph chart which is a way of summarizing a set of categorical data or displaying the different values of a given variable (e.g., percentage distribution). This type of chart is a circle divided into a series of segments. Each segment represents a particular category. The area of each segment is the same proportion of a circle as the category is of the total data set. This type of chart aims to show the component parts of a whole.

The code in R:
>mypie=c(40,30,20,10)
>mypie
[1] 40, 30, 20, 10
>pie(mypie)
>#R will produce the following pie chart:

Adding labels to the pie

The command in R:
>names(mypie)=c(“Red”,”Blue”,”Green”,”Brown”)
#The names command attaches a name to each piece of data. We can see the result of this command by typing the variable name (mypie) and hitting the Enter key.
The code in R:
>mypie
Red Blue Green Brown
40 30 20 10
The pie command knows how to apply these names to the pie chart.
>pie(mypie)

Customize the colors to fill the Pie
We add our colors and store the corresponding colors under “mycolors”. These colors include “red,” “blue,” “green,” “brown.”
The code in R:
> mycolors=c(“red”,”blue”,”green”,”brown”)
The last line in our code will include the pie and color attributes we set up. We also add a title to the pie “My first pie.”
>pie(mypie, main=”My first pie”, col=mycolors)

Next, Chapter 16, Advanced Visualization Display
Previous, Chapter 14, Time Series and Predictive Analytics

A Primer for Using Open Source R Software for Accessibility and Visualization