Wednesday, July 12, 2017

Gyenic/female ailments: Information and hypotheses.........

Excess androgen causes PCOS (polycystic ovarian syndrome) and insulin resistance
Torsion of cysts can prevent blood flow to the ovary
Dermoid cysts are large, and often contain hair, teeth, bone
Cautery - the act of coagulating blood and destroying tissue with a hot iron or caustic agent or by freezing
Its an approach to deal with polycystic ovarian cysts, if medications can't health them.
Pseudocyesis - false pregnancy
Women with this condition miss menstruation, and have bloated abdomens. Their hormone levels may rise, and their breasts may secrete colostrum.
Cause of high level of hormone can be ovarian cysts, tumors, depression etc.
Female athletes commonly suffer from: amenorrhea (being without a period for three months), decreased bone mineral density, and low energy availability (including disordered eating)
Inflammation is the cause of female reproductive issues. Pollution, chemical additives, cosmetics, fragrance etc. are the culprits

Monday, September 26, 2016

Bone diseases: Information and hypotheses.........

SOST gene acts to lower bone mass..its deficiency induces lifelong bone gain.
As some old people, who normally have osteoporosis (poor bone mass)...have bone growth here and there, it must be due to dysregulation of SOST gene
Bone overgrowth diseases: sclerosteosis, van Buchem disease, and autosomal dominant craniodiaphyseal dysplasia

Hormonal drugs can meddle with bones..rendering it porous..

Monday, July 25, 2016

Breast cancer: Information and hypotheses.........

Breast tissue is made of adipose cells and lympatic glands..
The suspected gene BRCA1 is on chromosome 17 and BRCA2 is on chromosome 13
ER (estrogen) -positive cancer:
Luminal A cell lines: MCF-7 and T47D
Metastatic: MDA-MB-231

Common endocrine-disrupting agents: aluminum, prabens, triclosan, phthalates, perfumes
#Mammograms test detects tumor in breast tissue. However, its radiation itself is not safety-proof. Accumulated radiation doses can initiate mutation itself.

#Persistent infection and allergen exposure can cause lymph gland to swell. sensing danger, aromataae enzyme iwll overexpress, producing estrogen, which can cause hyperplasia...

#Its a good piece of information to know that some tumor disappear undetected, untreated. That means, if the source of inflammation or abuse is removed, the signalling is likely to correct and immune attack is likely to subside. Proteasome complex might degrade the offensive proteins.

#As prevention is always better than therapy, one must not abuse body through bad lifestyles.
Healthy lifestyle includes simple, minimally-processed balanced food,  lower exposure to chemicals (pesticides, cosmetics, cleaning agents, food additives), physical activity, vitamin D from sun, stress-free life..

Thursday, July 21, 2016

Cancer types and cell lines..........

Cancer is a heterogeneous disease. Above all, therapeutic success in unpredictable.
Its result of inflammation...caused by perturbed proteases, wrecking havoc with normal functionality of body. Pesticides and household endocrine disruptors are increasing risks of cancer.
Personalized medicine is required to treat as genetics of each individual is different.
Poor prognosis, metastasis, high relapse rate make cancer deadly
Mapping the mutations, genes and their pathways can reveal a lot about cancer.
Diagnois: Ultrasound, colonoscopy, mammography..
Current therapeutic strategy include: surgery (mastectomies), chemo, radiotherapy, molecular targeted therapy

TCGA: Cancer Genome Atlas consortium
MURINE......
26L5:  murine colon carcinoma
B16BL6:  murine melanoma,
murine Lewis lung carcinoma
HUMAN....
A375 : human melanoma
A498: Renal carcinoma
A549:  human lung adenocarcinoma
AMC-HN-4:  malignant human head and neck
BT474: human breast
ChaGo: human bronchogenic
CNE1:  nasopharyngeal carcinoma
DU145:  hormone-resistant prostate cancer
GBM : human glioblastoma
HCC: Hepatocellular carcinoma
HeLa: human cervix adenocarcinoma
Hep-G2:human liver
HT-1080:  humanfibrosarcoma
KATO-III: human gastric
LNCaP: hormone-sensitive prostate cancer
MCF-7:  human breast cancer ERα+
mCRPC: metastatic castration resistant prostate cancer
PBMC: uman peripheral blood mononuclear cell
PC-3:  human colon carcinoma
SW620: human colon
U87MG:  human glioblastoma
Normal cell lines (control)
CH-liver
HCT116
HS27: fibroblast
HT29
SW480 cells

Human genes associated with different cancers/cancer-associated genes:
Colon: BCL9L, RBM10, CTCF, and KLF5
Cervical adenocarcinoma:
Breast cancer: BRCA1 and BRCA2
Ovarian cancer. BRCA1 and BRCA2
Well-known cancer pathways
Wnt pathway
Canonical
Wnt binds to its receptor Frizzled, and potential co-receptor LRP-5/6
It suppresses GSK-3ß phosphorylation of ß-Catenin.
ß-Catenin accumulates in nucleus
it binds to LEF/TCF transcription factors, which activate Wnt target genes.
Non-canonical
Wnt binds to Dishevelled protein by tyrosine kinase

Tuesday, May 3, 2016

Allergens: Types, sources.......

GENERAL
#######PLANTS#####
Ole e: Olea europaea (Common olive)
Sin a: Sinapis alba (White mustard)
2S albumin: Ricinus communis (Castor bean)
Pectate lyase: Cryptomeria japonica (Japanese cedar) (Cupressus japonica)
Expansin-B1: Zea mays (Maize)
Superoxide dismutase: Olea europaea (Common olive)
Small rubber particle protein: Hevea brasiliensis (Para rubber tree)
Exopolygalacturonase: Platanus acerifolia (London plane tree)
Major pollen allergen Bet v 1-A: Betula pendula (European white birch) (Betula verrucosa)
Profilin-2: Phleum pratense (Common timothy)
Pectinesterase 1: Olea europaea (Common olive)
Non-specific lipid-transfer protein:  Ambrosia artemisiifolia (Short ragweed)
Profilin-1 : Phleum pratense (Common timothy)
Pectate lyase 1: Ambrosia artemisiifolia (Short ragweed)
Pectate lyase 2: Ambrosia artemisiifolia (Short ragweed)
Bet v 1-L: Betula pendula (European white birch) (Betula verrucosa)
Amb a 3: Ambrosia artemisiifolia var. elatior (Short ragweed)
Pectinesterase 2: Olea europaea (Common olive)
Phl p 5b: Phleum pratense (Common timothy)
Polygalacturonase: Cryptomeria japonica (Japanese cedar)
Expansin-B11: Zea mays (Maize)
Lol p 1: Lolium perenne (Perennial ryegrass)
Profilin-4: Corylus avellana (European hazel) (Corylus maxima)
Actinidain: Actinidia deliciosa (Kiwi)
Polygalacturonase: Juniperus ashei (Ozark white cedar)
Esterase: Hevea brasiliensis (Para rubber tree)
Protein DOWNSTREAM OF FLC: Arabidopsis thaliana (Mouse-ear cress)
Major allergen Api g 1: Apium graveolens (Celery)
Alpha-amylase inhibitor BMAI-1: Hordeum vulgare (Barley)
Superoxide dismutase [Cu-Zn]: Olea europaea (Common olive)
Lactoylglutathione lyase: Oryza sativa subsp. japonica (Rice)
Profilin-1: Zea mays (Maize)
Ambrosia artemisiifolia (Short ragweed)
Bra j 1-E: Brassica juncea (Indian mustard) (Sinapis juncea)
Glucan endo-1,3-beta-glucosidase: Prunus avium (Cherry)
Non-specific lipid-transfer protein: Apium graveolens (Celery)
Dau c 1: Daucus carota (Wild carrot)
Pollen allergen KBG 41: Poa pratensis (Kentucky bluegrass)
Lol p 5a: Lolium perenne (Perennial ryegrass)
Profilin-2-5: Olea europaea (Common olive)
######FUNGI#####
60S acidic ribosomal protein P2: Alternaria alternata (Alternaria rot fungus)
Alcohol dehydrogenase 1: Candida albicans (Yeast)
Enolase:  Cladosporium herbarum
Glucoamylase: Trichophyton mentagrophytes
Cla h 7: Cladosporium herbarum
Ribonuclease mitogillin: (Aspergillus fumigatus)
Fructose-bisphosphate aldolase: Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)
60S acidic ribosomal protein P2: (Cladosporium herbarum)
Enolase: Alternaria alternata (Alternaria rot fungus)

######NEMATODE#####
Polyprotein ABA-1: Ascaris suum (Pig roundworm)
Major allergen Ani s 1: Anisakis simplex (Herring worm)
######ARTHROPODS#####
Pilosulin-3a: Myrmecia pilosula (Jack jumper ant) (Australian jumper ant)
Peptidase 1: Psoroptes ovis (Sheep scab mite)
Hyaluronidase A: Vespula vulgaris (Yellow jacket) (Wasp)
Eur m 3:Euroglyphus maynei (Mayne's house dust mite)
Peptidase 1: Dermatophagoides pteronyssinus (European house dust mite)
Mite group 2 allergen Lep d: Lepidoglyphus destructor (Storage mite)
Peptidase 1: Dermatophagoides farinae (American house dust mite)
Mite group 2 allergen Der p 2: Dermatophagoides pteronyssinus (European house dust mite)
Phospholipase A1: Solenopsis invicta (Red imported fire ant)
Melittin: Apis mellifera (Honeybee)
Pilosulin-1: Myrmecia pilosula (Jack jumper ant) (Australian jumper ant)
Hyaluronidase: Apis mellifera (Honeybee)
Aspartic protease Bla g 2: Blattella germanica (German cockroach) (Blatta germanica)
Peptidase 1: Euroglyphus maynei (Mayne's house dust mite)
Phospholipase A1 1: Dolichovespula maculata (Bald-faced hornet)
Venom allergen 3: Solenopsis invicta (Red imported fire ant)
Der p 3: Dermatophagoides pteronyssinus (European house dust mite)
Der f 3: Dermatophagoides farinae (American house dust mite)
Arginine kinase AK: Penaeus monodon (Giant tiger prawn)
Venom dipeptidyl peptidase 4: Apis mellifera (Honeybee)
Phospholipase A1: Vespula maculifrons (Eastern yellow jacket) (Wasp)
######FISH#####
Parvalbumin beta: Gadus morhua subsp. callarias (Baltic cod) 
######BIRDS#####
Ovalbumin: Gallus gallus (Chicken)
Ovotransferrin: Gallus gallus (Chicken)
Lysozyme C: Gallus gallus (Chicken)
Ovomucoid: Gallus gallus (Chicken)
######MAMMALS#####
Minor allergen Can f 2: Canis lupus familiaris (Dog) (Canis familiaris)
Major allergen I polypeptide chain: Felis catus (Cat)
Allergen Fel d 4: Felis catus (Cat) (Felis silvestris catus)
Major urinary protein: Rattus norvegicus (Rat)
Allergen Bos d 2: Bos taurus (Bovine)
Protein S100-A7: Bos taurus (Bovine)
Latherin : Equus caballus (Horse)
Major allergen Equ c 1: Equus caballus (Horse)
-----------------
SPECIFIC
#Cashew, Pistachio
Vicilin-like protein, 2s albumin, Ana o 2, 11S globulin
#Almond, peach
pru1, Pru du, Non-specific lipid-transfer protein
#Tomato
Profilin, pectate lyase
#Peanut
Conglutin-7, Defensin, Ara h, Profilin, Non-specific lipid-transfer protein
#Avocado
Endochitinase
#Kiwi
Actinidain, Cysteine proteinase inhibitor, Thaumatin-like protein, Act d
Kiwellin, Kirola, Non-specific lipid-transfer protein, Endochitinase, Bet v
#Persimmon
Expansin, Non-specific lipid-transfer protein
#Celery
Non-specific lipid-transfer protein, Chlorophyll a-b binding protein, Api g, Profilin
#Kidney bean
Pathogenesis-related protein 1
Pectate lyase
#Egg
Ovalbumin
Ovotransferrin
Lysozyme C
Ovomucoid
Serum albumin
#Shrimp, lobster
Tropomyosin
Arginine kinase
Arginine kinase
Pen a
Lit v
Sarcoplasmic calcium-binding protein
#Mussel
Tropomyosin
Endo-beta-1,4-glucanase
#Fish
Alpha-enolase
Beta-enolase
Parvalbumin beta
Fructose-bisphosphate aldolase A
#Octopus 
Arginine kinase
#Silk moth
SCP-related protein
Arginine kinase
Apolipoprotein of lipid transfer
#Rubber
Patatin

MY SCRIPT (2): Unique genes finding, their analysis, wrapper..

#Code to find out unique genes
#! /usr/bin
#Run as: sh unique_genes_finding.sh |& tee all_isolate_gene_profile

#mkdir /home/pseema/denovo_analysis/result_files/unique_genes
#find /home/pseema/denovo_analysis/result_files/*.only_header
while read strain;
do
while read isolate;
do

echo "#################Starting $isolate..####################"
#Extract all columns except column1
awk '{$1=""; print $0}' /home/pseema/denovo_analysis/result_files/$isolate.only_header > /home/pseema/denovo_analysis/result_files/$isolate.only_protein_name
echo "****Total number of proteins in $isolate: ******"
cat /home/pseema/denovo_analysis/result_files/$isolate.only_protein_name | wc -l
awk '!/hypothetical/' /home/pseema/denovo_analysis/result_files/$isolate.only_protein_name  >  /home/pseema/denovo_analysis/result_files/$isolate.only_functional_proteins
echo "******Number of non-hypothetical proteins in $isolate: *****"
cat  /home/pseema/denovo_analysis/result_files/$isolate.only_functional_proteins | wc -l
sort -u  /home/pseema/denovo_analysis/result_files/$isolate.only_functional_proteins > /home/pseema/denovo_analysis/result_files/$isolate.only_functional_proteins_sorted

#Shows common proteins to file 1 and file2 (option -12 or -21 can be used to achieve it)
echo "**Proteins common to $strain and $isolate: **"
comm -12  /home/pseema/denovo_analysis/result_files/$strain.only_functional_proteins_sorted  /home/pseema/denovo_analysis/result_files/$isolate.only_functional_proteins_sorted > /home/pseema/denovo_analysis/result_files/in_both.$strain.$isolate
cat /home/pseema/denovo_analysis/result_files/in_both.$strain.$isolate | wc -l
cat /home/pseema/denovo_analysis/result_files/in_both.$strain.$isolate
cp /home/pseema/denovo_analysis/result_files/in_both.$strain.$isolate  /home/pseema/denovo_analysis/result_files/unique_genes
echo "**Proteins common to $strain and $isolate done**"

#These proteins occur only in $strain (only column1)
echo "**Proteins unique to $strain: **"
comm -23  /home/pseema/denovo_analysis/result_files/$strain.only_functional_proteins_sorted  /home/pseema/denovo_analysis/result_files/$isolate.only_functional_proteins_sorted > /home/pseema/denovo_analysis/result_files/not_in.$isolate
cat /home/pseema/denovo_analysis/result_files/not_in.$isolate | wc -l
cat /home/pseema/denovo_analysis/result_files/not_in.$isolate
cp /home/pseema/denovo_analysis/result_files/not_in.$isolate  /home/pseema/denovo_analysis/result_files/unique_genes
echo "Unique protein search for $strain done"


#These proteins occur only in $isolate (only column2)
echo "**Proteins unique to $isolate: **"
comm -13  /home/pseema/denovo_analysis/result_files/$strain.only_functional_proteins_sorted  /home/pseema/denovo_analysis/result_files/$isolate.only_functional_proteins_sorted > /home/pseema/denovo_analysis/result_files/only_in.$isolate
cat /home/pseema/denovo_analysis/result_files/only_in.$isolate | wc -l
cat /home/pseema/denovo_analysis/result_files/only_in.$isolate
cp /home/pseema/denovo_analysis/result_files/only_in.$isolate  /home/pseema/denovo_analysis/result_files/unique_genes
echo "Unique protein search for $isolate done"

echo "********$isolate done********"
done < /home/pseema/denovo_analysis/input_files/isolate_list
#done < /home/pseema/denovo_analysis/input_files/IO_isolates
#done < /home/pseema/denovo_analysis/input_files/EAS_isolates
#done < /home/pseema/denovo_analysis/input_files/EAI_isolates
#done < /home/pseema/denovo_analysis/input_files/EAM_isolates

done < /home/pseema/denovo_analysis/input_files/strain_list
#done < /home/pseema/denovo_analysis/input_files/IO_isolates
#done < /home/pseema/denovo_analysis/input_files/EAS_isolates
#done < /home/pseema/denovo_analysis/input_files/EAI_isolates
#done < /home/pseema/denovo_analysis/input_files/EAM_isolates
-----------------------------------------------------
#! /usr/bin
#Code to analyze data for unique genes
#Execute as:  sh unique_genes_analysis.sh |& tee all_isolate_gene_analysis
#mkdir /home/pseema/denovo_analysis/result_files/unique_genes
#find *.matches_comm_12 |  wc -l
cat `find /home/pseema/denovo_analysis/result_files/unique_genes/in_both.*` > /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_common
echo "Common protein pool when the isolates were compared to each other..."
#cat /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_common | wc -l
uniq /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_common > /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_common_uniq
cat /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_common_uniq | wc -l
awk '!NF || !seen[$0]++' /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_common_uniq > /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_common_reduced
echo "Unique proteins in the common protein pool..."
cat /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_common_reduced | wc -l

#find *.matches_comm_23 |  wc -l
cat `find /home/pseema/denovo_analysis/result_files/unique_genes/not_in.*` > /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column1
cat /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column1 | wc -l
uniq /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column1 > /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column1_uniq
cat /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column1_uniq | wc -l
awk '!NF || !seen[$0]++' /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column1_uniq > /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column1_uniq_reduced
cat /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column1_uniq_reduced  | wc -l
#find *.matches_comm_13 |  wc -l
cat `find /home/pseema/denovo_analysis/result_files/unique_genes/only_in.*`> /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column2
cat /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column2 | wc -l
uniq /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column2  > /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column2_uniq
cat /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column2_uniq | wc -l
awk '!NF || !seen[$0]++' /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column2_uniq > /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column2_uniq_reduced
cat /home/pseema/denovo_analysis/result_files/unique_genes/all_isolates_only_column2_uniq_reduced  | wc -l

#Find lines to a given pattern
awk '/Proteins unique to/'  all_isolate_gene_profile > /home/pseema/denovo_analysis/result_files/unique_genes/pattern_files

#Find lines next to a given pattern
awk 'f{print;f=0} /Proteins unique to/{f=1}' all_isolate_gene_profile > /home/pseema/denovo_analysis/result_files/unique_genes/next_lines

#Paste these two files side by side
paste -d' ' /home/pseema/denovo_analysis/result_files/unique_genes/pattern_files /home/pseema/denovo_analysis/result_files/unique_genes/next_lines > /home/pseema/denovo_analysis/result_files/unique_genes/isolate_diff_unique_genes

#Extract only column 4
awk '{print $4}' /home/pseema/denovo_analysis/result_files/unique_genes/isolate_diff_unique_genes > /home/pseema/denovo_analysis/result_files/unique_genes/isolate_diff_unique_genes_only_isolate
#find difference between two consecutive lines in the generated file
#Extract only odd number lines
awk 'NR%2==1' /home/pseema/denovo_analysis/result_files/unique_genes/isolate_diff_unique_genes_only_isolate  > /home/pseema/denovo_analysis/result_files/unique_genes/isolate_diff_unique_genes_only_isolate_only_odd

#Extract only even number lines
awk 'NR%2==0' /home/pseema/denovo_analysis/result_files/unique_genes/isolate_diff_unique_genes_only_isolate  > /home/pseema/denovo_analysis/result_files/unique_genes/isolate_diff_unique_genes_only_isolate_only_even

#Paste the extracted columns side by side
paste -d' ' /home/pseema/denovo_analysis/result_files/unique_genes/isolate_diff_unique_genes_only_isolate_only_odd /home/pseema/denovo_analysis/result_files/unique_genes/isolate_diff_unique_genes_only_isolate_only_even > /home/pseema/denovo_analysis/result_files/unique_genes/merged_columns_isolates

#Find difference between two consecutive lines in the generated file
#Extract only odd number lines
awk 'NR%2==1' /home/pseema/denovo_analysis/result_files/unique_genes/next_lines  > /home/pseema/denovo_analysis/result_files/unique_genes/only_odd

#Extract only even number lines
awk 'NR%2==0' /home/pseema/denovo_analysis/result_files/unique_genes/next_lines  > /home/pseema/denovo_analysis/result_files/unique_genes/only_even

#Paste the extracted columns side by side
paste -d' ' /home/pseema/denovo_analysis/result_files/unique_genes/only_odd /home/pseema/denovo_analysis/result_files/unique_genes/only_even > /home/pseema/denovo_analysis/result_files/unique_genes/merged_columns
#Find difference between two columns of the file
awk 'NF > 0 { print $0 "\t" ($1 - $2) }' /home/pseema/denovo_analysis/result_files/unique_genes/merged_columns > /home/pseema/denovo_analysis/result_files/unique_genes/diff_columns

#Paste the extracted columns side by side
paste -d' ' /home/pseema/denovo_analysis/result_files/unique_genes/merged_columns_isolates /home/pseema/denovo_analysis/result_files/unique_genes/diff_columns > /home/pseema/denovo_analysis/result_files/unique_genes/isolate_gene_diff

#Print content beetween two patterns
echo "*****Isolate-specific unique protein*****"
awk '/Proteins unique to/ {flag=1;next} /Unique protein search/{flag=0} flag {print}' all_isolate_gene_profile && awk '/Unique protein search for/' all_isolate_gene_profile

#To find the common genes in all the files
echo "The core genes are......"
for isolate
do
awk '!NF || !seen[$0]++' /home/pseema/denovo_analysis/result_files/$isolate.only_functional_proteins  > /home/pseema/denovo_analysis/result_files/unique_genes/indispensable_genes
done < /home/pseema/denovo_analysis/input_files/isolate_list

#To find the shared genes in all the files (it checks from folder to folder to find the shared genes)
echo "The shared genes are......"
#To get rid of backup files
#find . -name '*~' -exec rm {} \;
cat /home/pseema/denovo_analysis/result_files/*.only_functional_proteins_sorted | awk 'END {
  for (R in rec) {
    n = split(rec[R], t, "/")
    if (n > 1)
      dup[n] = dup[n] ? dup[n] RS sprintf("\t%-20s -->\t%s", rec[R], R) : \
        sprintf("\t%-20s -->\t%s", rec[R], R)
    }
  for (D in dup) {
    printf "records found in %d files:\n\n", D
    printf "%s\n\n", dup[D]
    } 
  }

  rec[$0] = rec[$0] ? rec[$0] "/" FILENAME : FILENAME
  }'
  -----------------------------------------
#! /usr/bin
#Wrappr to call all related  scripts
#Code to find out unique genes
sh unique_genes_finding.sh |& tee all_isolate_gene_profile
#sh unique_genes_finding.sh |& tee IO_isolate_gene_profile
#sh unique_genes_finding.sh |& tee EAS_isolate_gene_profile
#sh unique_genes_finding.sh |& tee EAI_isolate_gene_profile
#sh unique_genes_finding.sh |& tee EAM_isolate_gene_profile

#Code to analyze data for unique genes
sh unique_genes_analysis.sh |& tee all_isolate_gene_analysis

Monday, May 2, 2016

Tools to learn and work to do......

Alignment...
Alignment of sequencing reads to a reference genome is a core step in the analysis workflows for many high-throughput sequencing assays, including ChIP-Seq, RNA-seq, ribosome profiling and others.
Bowtie  uses an extremely economical data structure called the FM index to store the reference genome sequence and allows it to be searched rapidly. 
TopHat uses Bowtie as an alignment ‘engine’ 

Mauve?
#To run the Mauve GUI from within Terminal 
#Add directory with executables to Mauve path
cd Mauve/
ls
cd mauve_2.3.1/
./Mauve 

File
Align with progressive Mauve
Select the executable folder (by navigation
Mauve Console starts running (1-2 minutes for two full genomes)
 Viewing the alignment
Zoom in    Ctrl + UpScroll 
display left    Ctrl + LeftScroll 
display right    Ctrl + RightLarge 
left scroll    Shift + Ctrl + LeftLarge 
right scroll    Shift + Ctrl + Right
Tool ---------> Export ---------> Export SNPs   

Indel determination..
Whats the logic used to pull information from vcf file?


R PSI Blast
#Reversed Position Specific BLAST, or RPS BLAST, use at command line
#extract just these *.smp files from the large archive (cdd.tar.gz).
#run the formatrpsdb tool to build a database:
formatrpsdb -t Sigma.v001 -i Sigma.pn -o T -f 9.82 -n Sigma -S 100.0
#creates the eight files i.e. Sigma.aux, Sigma.loo, Sigma.phr, Sigma.pin, Sigma.psd, Sigma.psi, Sigma.psq and Sigma.rps which together make up the database.
#Compare
rpsblast -i rpoD.faa -d Sigma -e 0.00001
rpsblast -i rpoD.faa -d Sigma -e 0.00001 -o rpoD.txt
rpsblast -i rpoD.faa -d Sigma -e 0.00001 -m 7 -o rpoD.xml
#If comparing with Pfam database
rpsblast -i rpoD.faa -d Pfam -e 0.00001
#comparing entire genome with the Sigma database made earlier.
rpsblast -i NC_003197.faa -d Sigma -e 0.00001 -o NC_003197.txt
rpsblast -i NC_003197.faa -d Sigma -e 0.00001 -m 7 -o NC_003197.xml

#Analyzing RPS-BLAST output with Biopython
#For the smaller xml file
from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("rpoD.xml")) :
print "QUERY: %s" % record.query
for align in record.alignments :
print " MATCH: %s..." % align.title[:60]
for hsp in align.hsps :
print " HSP, e=%f, from position %i to %i" \
% (hsp.expect, hsp.query_start, hsp.query_end)
if hsp.align_length < 60 :
print " Query: %s" % hsp.query
print " Match: %s" % hsp.match
print " Sbjct: %s" % hsp.sbjct
else :
print " Query: %s..." % hsp.query[:57]
print " Match: %s..." % hsp.match[:57]
print " Sbjct: %s..." % hsp.sbjct[:57]
print "Done"


#For the large xml file
from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("NC_003197.xml")) :
    #We want to ignore any queries with no search results:
    if record.alignments :
        print "QUERY: %s..." % record.query[:60]
        for align in record.alignments :
            for hsp in align.hsps :
                print " %s HSP, e=%f, from position %i to %i" \
                % (align.hit_id, hsp.expect, hsp.query_start, hsp.query_end)
print "Done"
That should give you the following output - note there is only 



#Running RPS-BLAST from Biopython
#Adjust the file locations to match your own:
rpsblast_db = "C:\\Blast\\cdd\\Sigma"
rpsblast_exe = "C:\\Blast\\bin\\rpsblast.exe"

query_filename = "rpoD.faa"
#query_filename = "NC_003197.faa"

E_VALUE_THRESH = 0.00001 #Adjust the expectation cut-off here

from Bio.Blast import NCBIStandalone
output_handle, error_handle = NCBIStandalone.rpsblast(rpsblast_exe, \
rpsblast_db, query_filename, expectation=E_VALUE_THRESH)


from Bio.Blast import NCBIXML
for record in NCBIXML.parse(output_handle) :
    #We want to ignore any queries with no search results:
    if record.alignments :
        print "QUERY: %s..." % record.query[:60]
        for align in record.alignments :
            for hsp in align.hsps :
                print " %s HSP, e=%f, from position %i to %i" \
                % (align.hit_id, hsp.expect, hsp.query_start, hsp.query_end)
                assert hsp.expect <= E_VALUE_THRESH
print "Done"