Decision Tree (Classification)

Following post shows an overview of Decision Trees using the Wisconsin Breast Cancer Dataset, from UCI Machine Learning Repository. Decision trees segment the predictor space into regions using splitting rules that can be visualized using a tree. In classification decision trees, each observation belongs to most commonly occurring class.
Wisconsin …
more ...

K-Nearest Neighbor (Classification)

Following post shows an overview of k-Nearest Neighbor using the Wisconsin Breast Cancer Dataset, from UCI Machine Learning Repository. k-Nearest Neighbor is a non-parametric, instance-based learning algorithm that "memorizes" the training space and uses training data for k most similar instances to classify a new instance.
Wisconsin Breast Cancer Data …
more ...

Naive Bayes (Classification)

Following post shows an overview of Naive Bayes using the Wisconsin Breast Cancer Dataset, from UCI Machine Learning Repository. Naive Bayes algorithm is based on Bayes Theorem as it assumes independence between the effects of a variable on a given class in presence of other attributes.
Wisconsin Breast Cancer Data …
more ...

Principle Component Analysis (PCA)

Following post shows an overview of Principle Components Analysis (PCA) using the Chronic Kidney Disease dataset, from UCI Machine Learning Repository. PCA is a dimensionality reduction technique that converts correlated predictors into a set of uncorrelated predictors called principle components using orthogonal transformations.
Chronic Kidney Disease Data, collected from a …
more ...

Random Forest (Classification)

Following post shows an overview of Random Forests using the Wisconsin Breast Cancer Dataset, from UCI Machine Learning Repository. Random Forests are a type of ensemble learning methods that learn from training number of individual decision trees, thereby reducing model variability and improving performance.
Wisconsin Breast Cancer Data, collected from …
more ...

Support Vector Machines (SVM)

Following post shows an overview of Support Vector Machine (SVM) using the Wisconsin Breast Cancer Dataset, from UCI Machine Learning Repository. SVM generalize maximal margin classifier to non-linear class boundaries using hyperplanes to separate classes.
Wisconsin Breast Cancer Data, collected from the University of Wisconsin Hospitals, Madison from Dr. William …
more ...

Linear Regression


Import packages

library(stats)


Split Data in Training (75%) and Test Sets (25%)

n <- nrow(mtcars)
index = sample(1:n, size = round(0.75*n), replace = FALSE)
train = mtcars[index, ]
test = mtcars[-index, ]
paste("Observations in training data: ", nrow(train), sep = "")
## [1] "Observations in training data: 24"
paste("Observations in …
more ...

Logistic Regression


Import packages

library(stats)


Split data in training and test sets

set.seed(90)
n <- nrow(CO2)
index = sample(1:n, size = round(0.75*n), replace = FALSE)
train = CO2[index, ]
test = CO2[-index, ]


Logistic model predicting Treatment from CO2 train dataset

class(CO2$Treatment)
## [1] "factor"
log_mod_train <- glm(Treatment …
more ...

Partition Data Frame in Training and Test Data


Import packages

library(caret)


Random Split Data in Training (75%) and Test Sets (25%)

n <- nrow(iris)
index = sample(1:n, size = round(0.75*n), replace = FALSE)
train = iris[index, ]
test = iris[-index, ]
paste("Observations in training data: ", nrow(train), sep = "")
## [1] "Observations in training data: 112"
paste("Observations …
more ...

Stepwise Regression (Forward, Backward, Both)


Import packages

library(olsrr)
library(MASS) # stepAIC function


Stepwise Regression using olsrr package

Forward Stepwise Regression

mod_forward <- lm(mpg ~ ., data = mtcars)
step_forward <- ols_step_forward(mod_forward)
## We are selecting variables based on p value...
## 1 variable(s) added....
## 1 variable(s) added...
## 1 variable(s) added...
## No more variables satisfy the condition …
more ...