Sunday, December 20, 2015

Bioinformatics (1) : BLAST..........

BLAST types: blastn (DNA DNA), blastp (protein protein), rpsblast, blastx (DNA protein), tblastn (DNA DNA), tblastx (protein DNA),  MOLEblast
Multiple alignment:MUSCLE,
Tree method: Fast Minimum Evolution, Tree Joining
database: nr, nt, ref_seq
BLAST invocations can be done using multiple different parameters.

#Installing BLAST
curl -O
tar xzf blast-2.2.24-ia32-linux.tar.gz
cp blast-2.2.24/bin/* /usr/local/bin
cp -r blast-2.2.24/data /usr/local/blast-data

#Obtaining fasta files from NCBI
curl -O
curl -O

#Uncompressing fasta files, formatting and blasting
ls -l *.faa.gz
gunzip *.faa.gz
less species1.faa
formatdb -i  species1.faa -o T -p T
formatdb -i  species2.faa -o T -p T
head species1.protein.faa > sample_species1.fa
blastall -i sample_species1.fa -d species2.faa -p blastp -o homology.txt
less homology.txt

# -K 20 (top 20 hits), -m 8 (tsv format), -a 8 (8 processor)
formatdb -p F -i $SUBJECT
blastall -p blastn -i $QUERY -d $SUBJECT -b -K 20 -m 8 -a 8 >  homology.txt

#Mtb.fa is the database here
blastp -db Mtb.fa -query fasta_file
blastp -db path_to_file/yeast.aa  -query query1

#Downloading all protein sequence for an organism from NCBI
#For M. tuberculosis
#From related information panel, select gene or protein
#From SUMMARY, select FASTA

#The link contains all NCBI databases  (e.g. nr, nt, est, gss, htgs, pat, refseq, wgs)
#FTP site of NCBI has all databases available for downloading
#Downloads 'nr' databases subfiles in one command
wget '*.tar.gz'
cat nr.*.tar.gz | tar -zxvi -f - -C 

#A BLAST search against a database requires at least a –query and –db option. 
blastn –db nt –query nt.fsa –out results.out
#Create a custom database from a multi-FASTA file of sequences

makeblastdb –in mydb.fsa –dbtype nucl –parse_seqids

No comments:

Post a Comment