Overview

A quick note on R packages

How to download/update packages?

# You can find and install packages within R
install.packages("sos") # Name must be in quotes
install.packages(c("sos","dplyr","ggplot2"))
# Packages get updated FREQUENTLY
update.packages() # will update them all

Finding Packages

Some Must Have Packages

plyr ggplot2 lme4 sp knitr sos forecast quantmod stringr XML and much more based on the task(s)!

Data Management

The Working Directory

Preliminaries

Manipulating Project/files Paths

When to use a full path

Ground Rules

Missing Data Symbols

a<-c(1,2,3)  # a is a vector with three elements
# Ask R for element 4
print(a[4]) # or simply a[4]
## [1] NA
a<-c(a,NULL) # Append NULL onto a
print(a)
## [1] 1 2 3
length(a)
## [1] 3
# Notice no change
a<-c(a,NA)
print(a)
## [1]  1  2  3 NA
length(a)
## [1] 4

What the heck is Not a Number?

b<-1
b<-sqrt(-b)
## Warning in sqrt(-b): NaNs produced
print(b)
## [1] NaN
pi/0
## [1] Inf

Organization of a analysis/project is key

Read in Data (aka import data)

CSV is Our Friend

#Load some data
oil<-read.csv("http://yunus.hacettepe.edu.tr/~iozkan/data/oilcsv.csv", header=T, sep=";")
# Note if we don't assign data to 'oil' using `<-` operator
# R just prints contents of table

Let’s Check What We Got

## 'data.frame':    7142 obs. of  4 variables:
##  $ Date : Factor w/ 7142 levels "01.02.1988","01.02.1989",..: 216 449 1150 1386 1625 1865 2103 2818 3055 3293 ...
##  $ Crude: num  25.6 26 26.5 25.9 25.9 ...
##  $ Brent: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ X    : logi  NA NA NA NA NA NA ...

Always Check Your Data

  dim(oil)
## [1] 7142    4
  summary(oil[,1:3])
##          Date          Crude            Brent       
##  01.02.1988:   1   Min.   : 10.25   Min.   :  9.10  
##  01.02.1989:   1   1st Qu.: 18.92   1st Qu.: 18.08  
##  01.02.1990:   1   Median : 25.32   Median : 25.23  
##  01.02.1991:   1   Mean   : 40.69   Mean   : 42.28  
##  01.02.1993:   1   3rd Qu.: 62.05   3rd Qu.: 63.12  
##  01.02.1994:   1   Max.   :145.31   Max.   :143.95  
##  (Other)   :7136   NA's   :98       NA's   :410

Checking your data II

  names(oil)
## [1] "Date"  "Crude" "Brent" "X"
  names(attributes(oil))
## [1] "names"     "class"     "row.names"
  class(oil)
## [1] "data.frame"

Another Example From About Getting Data From Internet

require(foreign)
## Loading required package: foreign
# SPSS files
dat.spss <- read.spss("http://www.ats.ucla.edu/stat/data/hsb2.sav", to.data.frame=TRUE)
# Stata files
dat.dta <- read.dta("http://www.ats.ucla.edu/stat/data/hsb2.dta")
#

require(readxl)
## Loading required package: readxl
## Warning: package 'readxl' was built under R version 3.1.3
# these two steps only needed to read excel files from the internet
f <- tempfile("hsb2", fileext=".xls")

# 1- Download file to a temporary file -- here it is f
download.file("http://www.ats.ucla.edu/stat/data/hsb2.xls", f, mode="wb")

# 2- read xls file from this temporary file
dat.xls <- read_excel(f, sheet=1)

Other References

Session Info

It is good to include the session info, e.g. this document is produced with knitr version 1.8.2. Here is my session info:

print(sessionInfo(), locale=FALSE)
## R version 3.1.1 (2014-07-10)
## Platform: i386-w64-mingw32/i386 (32-bit)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] readxl_0.1.0   foreign_0.8-61
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.4    evaluate_0.5.5  formatR_1.0     htmltools_0.2.6
##  [5] knitr_1.8.2     Rcpp_0.11.5     rmarkdown_0.6.1 stringr_0.6.2  
##  [9] tools_3.1.1     yaml_2.1.13

Introduction to R

Click here for A Brief Introduction to R.

Attribution and License

Public Domain Mark
This work (R Tutorial for Education, by Jared E. Knowles), in service of the Wisconsin Department of Public Instruction, is free of known copyright restrictions. Some pages were deleted/inserted by I. Ozkan to make more suitable to the Economics Students.