Data Visualisation in R

Vinu CT

21-03-2013

Agenda

    R Graphics: Why R?

    Traditional Graphics

    ggplot2

    Devices in R

Who am I?

  • Research Student at IIM Bangalore

  • Consultant, Genpact Bangalore. Dec 2006– May 2010.
  • ASE, TCS Chennai, Aug 2005 – Nov 2006

  • M.Tech QROR. ISI Kolkata 2003
  • MSc. Statistics., BSc. Mathematics. MGU Kottayam.

Interests

  • Time Series, Optimisation, and Statistical modelling
  • Data Graphics, Programming, and Philosophy of Science

Prerequisites

  1. Experience with R
  2. Idea about different Graphical representations
  3. Basic Statistics Knowledge


(Installation: R and ggplot2 package)

Why R?

    Many statistical software give options for Plots

    Why R?

    Customisable, Extensible, Reproducible, and Programmable

Few Examples

Example 1 : googleVis

WorldBank data

library(googleVis)
demo(WorldBank)

Example 2: Interactive plots

Courtesy: Paul Murrell (grid graphics)

Example 3: Plot using map

From my previous talk: R Graphics with ggplot2

Example 4: Multiple plots

R example: scatter plot

Example 5: Wordcloud

Source: Wordcloud:XKCD comics

Example 6: Animation

Source: Student’s T distribution: df and P[T>2]

Example 7: Adding other images

Courtesy: Marco D Visser

Mods

R Graphics packages

  • Base Graphics
  • Grid Graphics
  • Lattice Graphics
  • ggplot2
  • Other packages: plotrix, rgl, igraph, vcd, …

Base Graphics

Base Graphics Example 1

# Scatterplot
x <- c(0.5, 2, 4, 8, 12, 16)
y1 <- c(1, 1.3, 1.9, 3.4, 3.9, 4.8)
y2 <- c(4, 0.8, 0.5, 0.45, 0.4, 0.3)

Example 2 Histogram

Y <- rnorm(50)

Example 3 Pie chart

What is wrong with pie chart?

Base Graphics options

demo(graphics)
`?`(par)
names(par())
palette()
colours()
gray()
example(points)

ggplot2

ggplot2 compare with other graphics packages:

  • Produces very elegant graphics
  • Very flexible and Plot specification at a high level of abstraction
  • Very friendly email list (google group).
  • Based on grammar of graphics. Worths investing time in.
  • Syntax takes a while to get used to.
  • Relatively slow.

ggplot2 Components

  • Data
  • Aesthetic Mappings
  • Geoms, Stats, Scales, coordinate system, and positions
  • Faceting
  • Themes

Download the datafile

library(ggplot2)
s10 <- read.csv(file = "http://www.iimb.ernet.in/~vinuct10/ggplot/sachin.csv")
tail(s10, 3)
## X StartDate Runs BF SR Pos Dismissal Inns Opposition Ground
## 450 450 2012-03-13 6 19 31.57 2 caught 1 v Sri Lanka Dhaka
## 451 451 2012-03-16 114 147 77.55 2 caught 1 v Bangladesh Dhaka
## 452 452 2012-03-18 52 48 108.33 2 caught 2 v Pakistan Dhaka
## result Score
## 450 won 304/3
## 451 lost 289/5
## 452 won 330/4

Barplot

## Subsetting the dataset based on few dismissal
s10a <- subset(s10, Dismissal %in% c("bowled", "caught", "lbw", "not out"))
s10a$Inns <- factor(s10a$Inns)
qplot(data = s10a, Dismissal, geom = "bar")

qplot(data = s10a, Dismissal, geom = "bar", fill = Inns)

qplot(data = s10a, Dismissal, geom = "bar", fill = Inns, position = "dodge")

qplot(data = s10a, Dismissal, geom = "bar", fill = Inns, position = "fill")

Aesthetics (colour)

s10c <- subset(s10, Opposition %in% c("v Australia", "v South Africa", "v New Zealand"))
ggplot(data = s10c) + geom_point(aes(x = BF, y = SR, colour = Opposition)) +
xlab("Ball Faced") + ylab("Runs Scored") + labs(title = "Ball faced vs Runs Scored")

Aesthetics (shape)

ggplot(data = s10c) + geom_point(aes(x = BF, y = SR, shape = Opposition))

Aesthetics (Size)

ggplot(data = s10c) + geom_point(aes(x = BF, y = SR, size = Opposition))

Be careful selecting the aesthetics.

Aesthetics (colour and shape)

ggplot(data = s10c) + geom_point(aes(x = BF, y = SR, colour = Opposition, shape = result))

Geoms and Stats

## Examples of geoms: line,point,box,bar,...
s10b <- subset(s10, result %in% c("won", "lost") & Opposition %in% c("v Australia",
"v South Africa", "v New Zealand"))

s10b$result <- factor(s10b$result)
s10b$opp <- factor(s10b$Opposition)

ggplot(s10b, aes(opp, Runs)) + geom_violin(aes(fill = opp))

require(chron)
qplot(x = factor(years(StartDate)), y = Runs, geom = "boxplot", fill = I("blue"),
data = s10b) + xlab("Year")

Violin plot

boxplot

Stats and geoms

## Examples of stats: smooth, boxplot,...
qplot(Runs, data = s10b, fill = result, geom = c("histogram"))
qplot(Runs, data = s10b, colour = result, geom = c("density"))
ggplot(data = s10b) + geom_point(aes(x = BF, y = Runs)) + geom_line(data = s10,
aes(BF, BF)) + geom_smooth(aes(x = BF, y = Runs)) + xlab("Ball Faced") +
ylab("Runs Scored")


What is the output would be?

Scales

## Scales

ggplot(data = s10b) + geom_point(aes(x = BF, y = Runs, colour = result)) + scale_colour_manual(values = c(lost = "red",
won = "green"))

ggplot(data = s10b) + geom_point(aes(x = BF, y = Runs, size = result, shape = result)) +
scale_size_manual(values = c(lost = 5, won = 3)) + scale_shape_manual(values = c(lost = 15,
won = 25))

ggplot(data = s10b) + geom_point(aes(x = BF, y = Runs, colour = result)) + scale_colour_discrete(name = "Parinaam",
labels = c(lost = "HAAR", won = "JEET"))

Scales

Be careful selecting the aesthetics.

ggplot2 - Facets

## Facets

s10c = subset(s10, (Opposition %in% c("v Australia", "v England", "v South Africa",
"v Sri Lanka")) & (result %in% c("lost", "won")))

s10c$yr = cut(as.numeric(format(as.Date(s10c$StartDate), format = "%Y")), 4)
qplot(data = s10c, factor(Inns), fill = result, geom = "bar", position = "fill",
facets = Opposition ~ yr)

Facets

ggplot2 - Themes

## Facets Themes

s10a <- subset(s10, Dismissal %in% c("bowled", "caught", "lbw", "not out"))
s10a$Dismissal <- factor(s10a$Dismissal)
s10a$Inns <- factor(s10a$Inns)
qplot(data = s10a, Dismissal, geom = "bar", fill = Inns)
qplot(data = s10a, Dismissal, geom = "bar", fill = Inns) + theme_bw()
qplot(data = s10a, Dismissal, geom = "bar", fill = Inns) + theme(panel.background = element_rect(fill = "lightblue"))
qplot(data = s10a, Dismissal, geom = "bar", fill = Inns) + theme(plot.background = element_rect(fill = "yellow"),
panel.background = element_rect(fill = "purple"))

Default theme

Black and white theme

Theme

Fancy theme

Devices in R

  • Raster: png(), jpeg(), bitmap(), tiff()
  • Vector: pdf(), postscript(),svg(),cairo_ps()
  • On-screen devices: X11, X11cairo, windows, JavaGD, quartz (OS X)
  • Packages: rgl (OpenGL), GDD (bitmap formats), RSvgDevice, canvas (HTML canvas), RSVGTipsDevice, tikzDevice (Latex).

    Important factor for preparing publication level graphics
Width, height, font, and resolution

Vector graphics Vs Raster graphics

Source: Wikipedia

Mods

Useful Links

Slide Template

I have used a new package, reports, by Tyler Rinker to create this template. Check this link for more information

Tex-R IIMB

LATEX and R TexR IIMB: IIMB Students User group of any of the following discussion.

  • LATEX: a document preparation system
  • R
  • Open source applications.
  • Best practice sharing (Academic research)

Thank You

Contact details: vinu.ct@hotmail.com / @vinuct