Exploring the choppy water of coding: Language: R (Glossary, packages and Basic code snippets)..........

R (released by Robert and Ross) is great language for statistical analysis and graphical display. Its inspired from language S.
Its ideal to interact, interpret, analyse data(e.g ANOVA, regression), data manipulation, modelling, chart making
Its strength is huge library developed by users
Bioconductor project has generated many packages for R
Other related langauages are SPSS, Stata, SAS
Environments: TStudio (Linux), RGui (Windows)
Using RGui for Windows

Generate the script with R or r extension (e.g. script.R, script.r)
Load the script----when the script opens in the editor, select the content and run by right click and selecting 'run line or selection'...output should pop up as result or graph in the console

Save the graph
-----------------------------------------------------------
R is vector-based language.
Vectors are used to store measurements which can be numeric, character, logical vectors.
-----------------------------------------------------------
#R is rich in packages.

R packages

base, boot, class, cluster, codetools, compiler, datasets, foreign, graphics, grDevices, grid, KernSmooth, lattice, MASS, Matrix, methods, mgcv, nlme, nnet, parallel, rpart, spatial, splines, stats, stats4, survival, tcltk, tools
If a package is not part of original CRAN, it needs to be installed through any Mirror.
e.g. To install the package ggplot2, the following lines will be used. If download mirror site is not set , it needs to be mentioned.

install.packages("ggplot2")

library("ggplot2")

#Other important packages are data.table, plyr (to split big data)

CRAN (Comprehensive R Archive Network) is the R package repository. It has above 7,000 packages

Many package are already installed in R Gui, which can be just loaded
Some are available through the Mirrors, those can be installed without going to CRAN.

However, to install packages to R Gui, Go to CRAN

Suppose, stringr package is to be installed

Download it , install it from local zip file
#Important R functions
table (), head() ,rownames(), colnames(), nrow(), ncol(), by(), with()
rowSums(), rowMeans(), summary()
#Use of table ()
-----------------------------------------------------------
Useful links
http://rseek.org/
http://www.r-tutor.com/
-----------------------------------------------------------
#Calculations
1 + 2

#Assigning value 1 to x
x = 1
x
class(x)

x = 10.5
x
class(x)

#Function see combining 3 numbers into a vector
c(1, 2, 3)

#Numerics, integer, complex, logical,
x = 5
x
class(x)
is.integer(x)

y = as.integer(5)
y
class(y)
is.integer(y)

as.integer(5.7)
as.integer("6.8")
as.integer("Autumn")
as.integer(TRUE)
as.integer(FALSE)

z = 5 + 3i
z
class(z)

sqrt(−3)
sqrt(−4+0i)
sqrt(as.complex(−7))

x = 8; y = 5
z = x > y
z
class(z)

#Logical operations are "&" (and), "|" (or), and "!" (negation)

a = TRUE; b = FALSE

a & b

a | b

#Numeric to string conversion

x = as.character(4.7)

class(x)

#Concatenation

fname = "Carl"; lname ="Jones"

paste(fname, lname)

#String manipulation (printing, sub-string extraction, substitution)

sprintf("%s is %d year old", "Mila", 15)

substr("Autumn is my favorite season", start=1, stop=6)

sub("Banana", "Apple", "Banana is my favorite fruit")

-----------------------------------------------------------

Vector (1D)

c(3,7,8)

c(TRUE, TRUE, FALSE)

c("rt", "vg", "em")

length(c(3,7,8))

a = c(1,8,5)

b = c("spring", "autumn", "winter")

c(a,b)

#Vector arithmetic

a = c(2,4)

b = c(3,7)

3 * a

a + b

a - b

a * b

a / b

#Recycling of smaller vector

a = c(2,3,8)

b = c(4,5,6,9)

a + b

#Vector index

a = c("grapes", "apple", "cherry")

a[2]

a[c(1, 2)]

a[c(1, 1)]

a[c(3,1,2)]

a[-2]

a[5]

a[1:3]

a[1:2]

L = c(FALSE, TRUE, FALSE)

a[L]

L = c(TRUE, TRUE, FALSE)

a[L]

a = c("Alice", "Mulan")

names(a) = c("First", "Last")

a["First"]

a[c("Last", "First")]

-----------------------------------------------------------
Matrix (2D)

a = matrix(

c(6,8,5,4),

nrow=2,

ncol=2,

byrow = TRUE)

#a = matrix(c(6,8,5,4), nrow=2, ncol=2, byrow = TRUE)

#prints element at row 2 and column 1

a[2, 1]

#prints row 2

a[2, ]

#prints column 2

a[ ,2]

#prints column 1 and 2

a[ ,c(1,2)]

a = matrix(

c(6,8,5,4),

nrow=2,

ncol=2,

byrow = TRUE)

dimnames(a) = list(

c("row1", "row2")

("col1", "col2"))

#matrix construction

a = matrix(

c(3,7,5,8,9,2),

nrow=3,

ncol=2)

#Transposition makes rows into columns and vive-versa

t(a)

#Another matrix

b = matrix(

c(2,7,3,9,5,8),

nrow=3,

ncol=2)

#Another matrix

d = matrix(

c(8,9,7),

nrow=3,

ncol=1)

#Combining matrix a and b. For it, row and column number of both matrix must be same.

cbind(a,b)

#Combining matrix a and d. For it, row number of both matrix must be same.

cbind(a,d)

#Deconstruction of matrix a

c(a)

-----------------------------------------------------------
List (slicing, member reference)

#List is a number of vectors with diffferent components

a = c(4,7)

b = c("sweet", "sour", "bitter")

d = c(TRUE, FALSE,FALSE)

x = list(a,b,d)

z = list(a,b,d,9)

x[1]

x[2]

x[3]

x[c(2, 3)]

z[c(1, 4)]

x[[1]]

x[[1]][1] = 6

x[[1]]

a = list(spring=c("bird", "flower", "breeze"), autumn=c("fruits", "foliage"))

a["spring"]

a[c("spring", "autumn")]

a[["spring"]]

a$spring

attach(a)

spring

detach(a)

-----------------------------------------------------------
Data frames

#Data frame is used to store data tables.

#It is a list of vectors of equal length

#There are many in-built data frames in R.

a = c(1, 3, 5)
b = c("ef", "gb", "hj")
c = c(ed, rt, yu)
df = data.frame(a, b, c)

a = c(4,7,9)

b = c("sweet", "sour", "bitter")

d = c(TRUE, FALSE,FALSE)

df = data.frame(a,b,d)

#Element at row1 and column2

df[1, 2]

nrow(df)

ncol(df)

head(df)

#column slicing

df[[2]]

df[["b"]]

df$b

df[,"b"]

df[1]

df["a"]

df[c("a", "b")]

#row slicing

#First row

df[1,]

#First and second row

df[c(1, 2),]

#Example of an in-built data frame

#The header contains columns names. Data rows constitute of many rows.
mtcars
#Find value at fisrt row, second column
mtcars[1, 2]
head(mtcars)
nrow(mtcars)
ncol(mtcars)
help(mtcars)

mtcars

mtcars[1, 2]

mtcars["Mazda RX4", "cyl"]

nrow(mtcars)

ncol(mtcars)

head(mtcars)

#Importing a data frame

getwd()

setwd( "C:/Users/Seema/Desktop")

library(XLConnect)

wk = loadWorkbook("translation.xls")

df = readWorksheet(wk, sheet="Sheet1")

-----------------------------------------------------------
R can be used for NGS, microarary, ChipSeq data analysis
-----------------------------------------------------------
#To clean the terminal
Ctrl L
# print the current working directory
getwd()
# list files and folders in the current directory
dir()
# list the objects in the current workspace
ls()
# change to specified directory, suppose to Desktop
setwd("C:/Users/Seema/Desktop")
# list all R libraries
library()
#To find help page for a function
?data.frame
#To find functions with the a particular word
??list
#To quit console
q()
----------------------------------------------------------------------------
#Assignment with symbol <-
x<- 5
x %% y
x == 4

#Operators
x<-c(1:5)
x
x>4
logical1<- x>4
logical2<- x<3
logical1
logical2
logical1 | logical2
logical1 & logical2
logical1 && logical2
x[logical1]
x[logical2]
----------------------------------------------------------------------------
The link below is a great online tool for R practice
http://www.tutorialspoint.com/r_terminal_online.php
Execute the script by the command
source ('script.R')

----------------------------------------------------------------------------

Being familiar with naming conventions make it easy to be well-versed with the language, and customize output, so here is some common R jargons.

---------------------------------------------------------------------

help (Distributions), help (Normal), help (TDist), help(Chisquare), help(Binomial)

---------------------------------------------------------------------

seq (from, to, by=)

NA: Not available

NAN: Not a number

dt: Distribution function

pt: Cumulative probability distribution function

qt: Inverse cumulative probability distribution function

rt: Random number generation

dnorm: Gives the height of the probability distribution (density)

pnorm: Gives the distribution function

qnorm: Gives the quantile function

rnorm: Generates random deviates

col: plotting color (for axis, labels, titles, subtitles, foreground, background)

par: current settings

mar: margin

----------------------------------------------------------------------------
Some examples to see result as infinite value or NAN
0 / 0
1/0
sin(Inf)
cos(Inf)
tan(Inf)

> 0/0                                                                   

[1] NaN 

> 1/0                                                                   

[1] Inf 

> sin (Inf)                                                             

[1] NaN                                                              

> cos (Inf)                                                             

[1] NaN                                                                

> tan(Inf)                                                              

[1] NaN                                                                 

##########################################

To plot distribution graph (using dt)

x <- seq(-10,20,by=.5)

y <- dt(x,df=5)

plot(x,y)

y <- dt(x,df=20)

plot(x,y)

To plot distribution graph (using pt)

pt(3,df=10)

1-pt(3,df=5)

x = c(-3,-6,-2,-1)

pt((mean(x)-2)/sd(x),df=10)

To plot distribution graph (using qt)

qt(0.05,df=10)

v <- c(0.005,.025,.05, 0.5)

qt(v,df=27)

To plot distribution graph (using rt)

rt(3,df=5)

##########################################

To plot graph (using dnorm)

dnorm(0)

dnorm(0)*sqrt(2*pi)

x <- seq(-10,10,by=0.5)

y <- dnorm(x)

plot(x,y)

y <- dnorm(x,mean=1,sd=0.5)

plot(x,y)

##########################################

To plot graph (using pnorm)

x <- seq(-5, 2, 1)
y1 <- pnorm(x)
y2 <- pnorm(x,1,4)
plot(x,y1,type="l",col="green")
plot(x,y2,type="l",col="blue")

##########################################

To plot both y1 and y2 in one graph

x <- seq(-1, 1, 0.5)

y1 <- pnorm(x)

y2 <- pnorm(x, 1, 2)

matplot(x, cbind(y1,y2),type="l",col=c("blue","red"),lty=c(1,1))

##########################################
Chi square calculation (dchisq, pchisq, qchisq, rchisq)

x <- seq(-10,20,by=.5)
y <- dchisq(x,df=5)
plot(x,y)
y <- dchisq(x,df=10)
plot(x,y)
----------------------------
pchisq(2,df=10)

x = c(2,4,5,6)
pchisq(x,df=20)
----------------------------
qchisq(0.05,df=5)

y <- c(0.005,.025,.05)
qchisq(y,df=20)
----------------------------
rchisq(3,df=10)
##########################################
Binomial calculation (dbinom, pbinom, qbinom, rbinom)
x <- seq(0,20,by=1)
y <- dbinom(x,20,0.2)
plot(x,y)

pbinom(10,20,0.5)

qbinom(0.5,25,0.5)

rbinom(5,20,0.5)
##########################################
#Use of ggplot2 (the graphic package implemented on top of R package)
library(ggplot2)

ggplot(ToothGrowth, aes(x=as.factor(dose), y=len, color=supp)) +
geom_boxplot(position=position_dodge(0.5))+
geom_jitter(position=position_dodge(0.4)) +
xlab("dose")
-------------------------------------------------------------------------------
#Loading excel file data to make data frame
#Option1: perl needed for the code to work

install.packages("gdata")

library("gdata")

library(gdata)
help(read.xls)
data = read.xls("data.xls")
#Option2: java needed for the code to work

install.packages("XLConnect")

library("XLConnect")

library(XLConnect)
wk = loadWorkbook("data.xls")

df = readWorksheet(wk, sheet="Sheet1")

#Loading text file data
mydata = read.table("mydata.txt")
mydata
**For the above loading codes to work, the data file must be in proper directory path. The codes below can be used to fix the path.
getwd()
setwd( "C:/Users/Seema/Desktop")
-------------------------------------------------------------------------------
Limma: Linear Models for Microarray Data

Exploring the choppy water of coding

Sunday, December 20, 2015

Language: R (Glossary, packages and Basic code snippets)..........

No comments:

Post a Comment