Sunday, December 20, 2015

Language: R (Glossary, packages and Basic code snippets)..........

R (released by Robert and Ross) is great language for statistical analysis and graphical display. Its inspired from language S.
Its ideal to interact, interpret, analyse data(e.g ANOVA, regression), data manipulation, modelling, chart making
Its strength is huge library developed by users
Bioconductor project has generated many packages for R
Other related langauages are SPSS, Stata, SAS
Environments: TStudio (Linux), RGui (Windows)
Using RGui for Windows
Generate the script with R or r extension (e.g. script.R, script.r)
Load the script----when the script opens in the editor, select the content and run by right click and selecting 'run line or  selection'...output should pop up as result or graph in the console
Save the graph
-----------------------------------------------------------
R is vector-based language.
Vectors are used to store measurements which can be numeric, character, logical vectors.
-----------------------------------------------------------
#R is rich in packages.
R packages
base, boot, class, cluster, codetools, compiler, datasets, foreign, graphics, grDevices, grid, KernSmooth, lattice, MASS, Matrix, methods, mgcv, nlme, nnet, parallel, rpart, spatial, splines, stats, stats4, survival, tcltk, tools
If a package is not part of original CRAN, it needs to be installed through any Mirror.
e.g. To install the package ggplot2, the following lines will be used. If download mirror site is not set , it needs to be mentioned. 
install.packages("ggplot2")
library("ggplot2")
#Other important packages are data.table, plyr (to split big data)
CRAN (Comprehensive R Archive Network) is the R package repository. It has above 7,000 packages
Many package are already installed in R Gui, which can be just loaded
Some are available through the Mirrors, those can be installed without going to CRAN. 
However, to install packages to R Gui, Go to CRAN
Suppose, stringr package is to be installed
Download it , install it from local zip file
#Important R functions 
table (), head() ,rownames(), colnames(), nrow(), ncol(), by(), with()
rowSums(), rowMeans(), summary()
#Use of table ()
-----------------------------------------------------------
Useful links
http://rseek.org/
http://www.r-tutor.com/
-----------------------------------------------------------
#Calculations
1 + 2

#Assigning value 1 to x
x = 1
x
class(x)

x = 10.5    
x            
class(x)    

#Function see combining 3 numbers into a vector
c(1, 2, 3)

#Numerics, integer, complex, logical, 
x = 5
x          
class(x)
is.integer(x)

y = as.integer(5)
y          
class(y)    
is.integer(y)

as.integer(5.7)
as.integer("6.8")
as.integer("Autumn")
as.integer(TRUE)
as.integer(FALSE)

z = 5 + 3i
z
class(z)

sqrt(−3)
sqrt(−4+0i)
sqrt(as.complex(−7))

x = 8; y = 5
z = x > y    
z            
class(z)


#Logical operations are "&" (and), "|" (or), and "!" (negation)

a = TRUE; b = FALSE 

a & b          
a | b          
!a 
!b  

#Numeric to string conversion
x = as.character(4.7) 
x              
class(x) 

#Concatenation
fname = "Carl"; lname ="Jones" 
paste(fname, lname) 

#String manipulation (printing, sub-string extraction, substitution)
sprintf("%s is %d year old", "Mila", 15) 
substr("Autumn is my favorite season", start=1, stop=6) 
sub("Banana", "Apple", "Banana is my favorite fruit") 
-----------------------------------------------------------
Vector (1D)
c(3,7,8) 
c(TRUE, TRUE, FALSE) 
c("rt", "vg", "em") 
length(c(3,7,8))
a = c(1,8,5) 
b = c("spring", "autumn", "winter") 
c(a,b)
#Vector arithmetic
a = c(2,4) 
b = c(3,7)
3 * a 
a + b 
a - b 
a * b 
a / b 
#Recycling of smaller vector
a = c(2,3,8) 
b = c(4,5,6,9) 
a + b
#Vector index
a = c("grapes", "apple", "cherry") 
a[2] 
a[c(1, 2)] 
a[c(1, 1)]
a[c(3,1,2)]
a[-2] 
a[5]  
a[1:3]
a[1:2]
L = c(FALSE, TRUE, FALSE) 
a[L] 
L = c(TRUE, TRUE, FALSE) 
a[L] 
a = c("Alice", "Mulan") 
names(a) = c("First", "Last") 
a
a["First"] 
a[c("Last", "First")] 
-----------------------------------------------------------
Matrix (2D)
a = matrix( 
c(6,8,5,4), 
nrow=2,               
ncol=2,              
byrow = TRUE) 

#a = matrix(c(6,8,5,4), nrow=2, ncol=2, byrow = TRUE)  
#prints element at row 2 and column 1          
a[2, 1]
#prints row 2
a[2, ]
#prints column 2
a[ ,2]
#prints column 1 and 2

a[ ,c(1,2)]  
a = matrix( 
c(6,8,5,4), 
nrow=2,               
ncol=2,              
byrow = TRUE) 
dimnames(a) = list(
c("row1", "row2")
("col1", "col2")) 

#matrix construction
a = matrix( 
c(3,7,5,8,9,2), 
nrow=3, 
ncol=2) 
a

#Transposition makes rows into columns and vive-versa
t(a) 

#Another matrix
b = matrix( 
c(2,7,3,9,5,8), 
nrow=3, 
ncol=2) 
b

#Another matrix
d = matrix( 
c(8,9,7), 
nrow=3, 
ncol=1) 
d

#Combining matrix a and b. For it, row and column number of both matrix must be same.
cbind(a,b) 

#Combining matrix a and d. For it, row number of both matrix must be same.
cbind(a,d)
#Deconstruction of matrix a
c(a) 
-----------------------------------------------------------
List (slicing, member reference)
#List is a number of vectors with diffferent components
a = c(4,7) 
b = c("sweet", "sour", "bitter") 
d = c(TRUE, FALSE,FALSE) 
x = list(a,b,d)
z = list(a,b,d,9)   
x[1]
x[2] 
x[3]
x[c(2, 3)] 
z[c(1, 4)]
x[[1]]  
x[[1]][1] = 6 
x[[1]]
a

a = list(spring=c("bird", "flower", "breeze"), autumn=c("fruits", "foliage")) 
a
a["spring"] 
a[c("spring", "autumn")] 
a[["spring"]] 
a$spring 

attach(a) 
spring
detach(a)
-----------------------------------------------------------
Data frames
#Data frame is used to store data tables.
#It is a list of vectors of equal length
#There are many in-built data frames in R.
a = c(1, 3, 5)
b = c("ef", "gb", "hj")
c = c(ed, rt, yu)
df = data.frame(a, b, c)

a = c(4,7,9) 
b = c("sweet", "sour", "bitter") 
d = c(TRUE, FALSE,FALSE) 
df = data.frame(a,b,d)
df
#Element at row1 and column2
df[1, 2] 
nrow(df)    
ncol(df)    
head(df)

#column slicing
df[[2]]
df[["b"]] 
df$b 
df[,"b"] 
df[1]
df["a"] 
df[c("a", "b")]

#row slicing
#First row
df[1,]  

#First and second row        
df[c(1, 2),]

#Example of an in-built data frame
#The header contains columns names. Data rows constitute of many rows.
mtcars
#Find value at fisrt row, second column
mtcars[1, 2]
head(mtcars)
nrow(mtcars)
ncol(mtcars)
help(mtcars)
mtcars                
mtcars[1, 2] 
mtcars["Mazda RX4", "cyl"] 
nrow(mtcars)    
ncol(mtcars)    
head(mtcars)

#Importing a data frame
getwd() 
setwd( "C:/Users/Seema/Desktop")     
library(XLConnect)               
wk = loadWorkbook("translation.xls") 
df = readWorksheet(wk, sheet="Sheet1")
df
-----------------------------------------------------------
R can be used for NGS, microarary, ChipSeq data analysis
-----------------------------------------------------------
#To clean the terminal
Ctrl L
# print the current working directory 
getwd() 
# list files and folders in the current directory
dir()
# list the objects in the current workspace
ls()  
# change to specified directory, suppose to Desktop
setwd("C:/Users/Seema/Desktop")
# list all R libraries
library()  
#To find help page for a function
?data.frame
#To find functions with the a particular word 
??list
#To quit console
q()
----------------------------------------------------------------------------
#Assignment with symbol <-
x<- 5
x %% y
x == 4


#Operators
x<-c(1:5)
x
x>4
logical1<- x>4
logical2<- x<3
logical1
logical2
logical1 | logical2
logical1 & logical2
logical1 && logical2
x[logical1]
x[logical2]
----------------------------------------------------------------------------
The link below is a great online tool for R practice
http://www.tutorialspoint.com/r_terminal_online.php
Execute the script by the command
source ('script.R')
----------------------------------------------------------------------------
Being familiar with naming conventions make it easy to be well-versed with the language, and customize output, so here is some common R jargons.
---------------------------------------------------------------------
help (Distributions), help (Normal), help (TDist), help(Chisquare), help(Binomial)
---------------------------------------------------------------------
seq (from, to, by=)
NA: Not available
NAN: Not a number
dt: Distribution function
pt: Cumulative probability distribution function
qt: Inverse cumulative probability distribution function
 rt: Random number generation 
dnorm: Gives the height of the probability distribution (density)
pnorm: Gives the distribution function
qnorm: Gives the quantile function
rnorm: Generates random deviates
col: plotting color (for axis, labels, titles, subtitles, foreground, background)
par: current settings
mar: margin
----------------------------------------------------------------------------
Some examples to see result as infinite value or NAN
 0 / 0
 1/0
sin(Inf)
cos(Inf)
tan(Inf)
> 0/0                                                                   
[1] NaN 
                                                                
> 1/0                                                                   
[1] Inf 
                                                               
> sin (Inf)                                                             
[1] NaN                                                              
                                          
> cos (Inf)                                                             
[1] NaN                                                                
                                           
> tan(Inf)                                                              
[1] NaN                                                                 

##########################################
To plot distribution graph (using dt)
x <- seq(-10,20,by=.5)
y <- dt(x,df=5)
plot(x,y)
y <- dt(x,df=20)
plot(x,y)

To plot distribution graph (using pt)
pt(3,df=10)
1-pt(3,df=5)
x = c(-3,-6,-2,-1)
pt((mean(x)-2)/sd(x),df=10)

To plot distribution graph (using qt)
qt(0.05,df=10)

v <- c(0.005,.025,.05, 0.5)
qt(v,df=27)

To plot distribution graph (using rt)
rt(3,df=5)
##########################################
To plot graph (using dnorm)
dnorm(0)

dnorm(0)*sqrt(2*pi)

x <- seq(-10,10,by=0.5)
y <- dnorm(x)
plot(x,y)
y <- dnorm(x,mean=1,sd=0.5)
plot(x,y)
##########################################
To plot graph (using pnorm)

x  <- seq(-5, 2, 1)
y1 <- pnorm(x)
y2 <- pnorm(x,1,4)
plot(x,y1,type="l",col="green")
plot(x,y2,type="l",col="blue")

##########################################
To plot both y1 and y2 in one graph

x  <- seq(-1, 1, 0.5)
y1 <- pnorm(x)
y2 <- pnorm(x, 1, 2)

matplot(x, cbind(y1,y2),type="l",col=c("blue","red"),lty=c(1,1))

##########################################
Chi square calculation (dchisq, pchisq, qchisq, rchisq)

x <- seq(-10,20,by=.5)
y <- dchisq(x,df=5)
plot(x,y)
y <- dchisq(x,df=10)
plot(x,y)
----------------------------
pchisq(2,df=10)

x = c(2,4,5,6)
pchisq(x,df=20)
----------------------------
qchisq(0.05,df=5)

y <- c(0.005,.025,.05)
qchisq(y,df=20)
----------------------------
rchisq(3,df=10)
##########################################
Binomial calculation (dbinom, pbinom, qbinom, rbinom)
x <- seq(0,20,by=1)
y <- dbinom(x,20,0.2)
plot(x,y)

pbinom(10,20,0.5)

qbinom(0.5,25,0.5)

rbinom(5,20,0.5)
##########################################
#Use of ggplot2 (the graphic package implemented on top of R package)
library(ggplot2)

ggplot(ToothGrowth, aes(x=as.factor(dose), y=len, color=supp)) +
        geom_boxplot(position=position_dodge(0.5))+
        geom_jitter(position=position_dodge(0.4)) +
        xlab("dose")
-------------------------------------------------------------------------------
#Loading excel file data to make data frame 
#Option1: perl needed for the code to work
install.packages("gdata")
library("gdata")
library(gdata)                
help(read.xls)                  
data = read.xls("data.xls")
#Option2: java needed for the code to work
install.packages("XLConnect")
library("XLConnect")
library(XLConnect)            
wk = loadWorkbook("data.xls")

df = readWorksheet(wk, sheet="Sheet1")

#Loading text file data 
mydata = read.table("mydata.txt")
mydata
**For the above loading codes to work, the data file must be in proper directory path. The codes below can be used to fix the path.
getwd()
setwd( "C:/Users/Seema/Desktop")
-------------------------------------------------------------------------------
Limma: Linear Models for Microarray Data

No comments:

Post a Comment