Friday, February 5, 2016

Bio (16): Biostatistics for biomedical use..

Feature selection is used to identify the most discriminating features for biomaker discovery, medical diagnosis, and gene selection. 
Random Forest (RF) is an ensemble (of multiple decision trees) classifier, which applies bagging technique to construct an ensemble of trees, with randomization technique for the growth of each tree.   RF is suitable for high-dimensional and small-sample datasets.
Support Vector Machine (SVM) is a classifier
Parzen window based distribution calculates probability density function (pdf) in non-parametric approach

Each data point contributes equally to pdf
Uniform distribution
Normal distribution (bell curve). pdf is Gaussian function here
Variance is square of standard deviation
Probability (p) = k/n
Artificial neural network (ANN)
ANN helps model complex relations between input and output. Finds patterns in data (e.g. protein catabolic rate, optical character recgnition (OCR), )
ANN architecture can have many layers i.e 1 (3 node), 2 (4 node), 3 (2 node)...
Transfer function = sum of all weight * input
There are man activation functions
Deep learning is about making data analysis sophisticated enough to derive personality
Lowest E means less difference between desired and actual value (training iteration tends to minimize E)
Genetic algorithm (GA) is more random
If wave E(w), use GA
If steep descent, use back propagation or anything based on gradient descent
Clustering can be crisp or fuzzy

No comments:

Post a Comment