Sunday, December 20, 2015

Bioinformatics (1) : BLAST..........

BLAST types: blastn (DNA DNA), blastp (protein protein), rpsblast, blastx (DNA protein), tblastn (DNA DNA), tblastx (protein DNA),  MOLEblast
Multiple alignment:MUSCLE,
Tree method: Fast Minimum Evolution, Tree Joining
database: nr, nt, ref_seq
BLAST invocations can be done using multiple different parameters.

#Installing BLAST
curl -O ftp://ftp.ncbi.nih.gov/blast/executables/release/2.2.24/blast-2.2.24-ia32-linux.tar.gz
tar xzf blast-2.2.24-ia32-linux.tar.gz
cp blast-2.2.24/bin/* /usr/local/bin
cp -r blast-2.2.24/data /usr/local/blast-data

#Obtaining fasta files from NCBI
curl -O ftp://ftp.ncbi.nih.gov/refseq/species1.gz
curl -O ftp://ftp.ncbi.nih.gov/refseq/species2.gz

#Uncompressing fasta files, formatting and blasting
ls -l *.faa.gz
gunzip *.faa.gz
less species1.faa
formatdb -i  species1.faa -o T -p T
formatdb -i  species2.faa -o T -p T
head species1.protein.faa > sample_species1.fa
blastall -i sample_species1.fa -d species2.faa -p blastp -o homology.txt
less homology.txt

# -K 20 (top 20 hits), -m 8 (tsv format), -a 8 (8 processor)
QUERY=species1.fasta
SUBJECT=species2.fasta
formatdb -p F -i $SUBJECT
blastall -p blastn -i $QUERY -d $SUBJECT -b -K 20 -m 8 -a 8 >  homology.txt

#Mtb.fa is the database here
blastp -db Mtb.fa -query fasta_file
blastp -db path_to_file/yeast.aa  -query query1

#Downloading all protein sequence for an organism from NCBI
http://www.ncbi.nlm.nih.gov/genome/
#For M. tuberculosis
http://www.ncbi.nlm.nih.gov/genome/166
#From related information panel, select gene or protein
http://www.ncbi.nlm.nih.gov/protein?LinkName=genome_protein&from_uid=166
#From SUMMARY, select FASTA

#The link contains all NCBI databases  (e.g. nr, nt, est, gss, htgs, pat, refseq, wgs)
ftp://ftp.ncbi.nlm.nih.gov/blast/db/
#FTP site of NCBI has all databases available for downloading
#Downloads 'nr' databases subfiles in one command
wget 'ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr.*.tar.gz'
cat nr.*.tar.gz | tar -zxvi -f - -C 

#A BLAST search against a database requires at least a –query and –db option. 
blastn –db nt –query nt.fsa –out results.out
#Create a custom database from a multi-FASTA file of sequences

makeblastdb –in mydb.fsa –dbtype nucl –parse_seqids

No comments:

Post a Comment