Data Science with R

Bigram Analysis

For this bigram analysis, The Time Machine, a science fiction piece by H.G. Wells, was analyzed from the Project Gutenberg, which offers over 56,000 free e-books. Package gutenbergr downloads and processes public domain works in the Project Gutenberg. All other works from The Project Gutenberg can be retrieved …

more ...

Topic Modeling (Latent Dirichlet Modeling) using Project Gutenberg

For this topic modeling analysis, three works were analyzed: Pride and Prejudice by Jane Austen, The Fall of the House of Usher by Edgar Allan Poe and The War of the Worlds by H.G. Wells. Since each work has a distinct genre and explores specific themes, these three works …

more ...

Word Cloud

For this word cloud analysis, Great Expectations by Charles Dickens was studied. Great Expectations charts Pip, an orphan's, personal development, exploring universal themes like guilt, persistence and social advancement and historical constructs like wealth, poverty, morality, good versus evil, and Victorian social structures.

This post will provide a brief bigram …

more ...

Convert String to Upper and Lower Case

Convert String to Upper Case

string <- "lower case"
up <- toupper(string)
up

## [1] "LOWER CASE"

Convert String to Lower Case

string <- "UPPER CASE"
low <- tolower(string)
low

## [1] "upper case"

more ...

Split by Character/Separator

Creating a sample dataframe

LatLong <- c("40.841885, -73.856621",
             "40.675026, -73.944855", 
             "40.726253, -73.806710",
             "40.725375, -73.789845", 
             "40.845456, -73.876555")
Location <- c("Bronx", "Brooklyn", 
              "Manhattan", "Queens", "Staten Island")
geoData <- data.frame(LatLong, Location)
geoData

##                 LatLong      Location
## 1 40.841885, -73.856621         Bronx
## 2 40 …

more ...

Substitute a Pattern in String

Substitute first instance of a pattern in a text

text = "Apples and oranges are fruits"
sub("p", "b", text) # replace first instance of letter p with b

## [1] "Abples and oranges are fruits"

Substitute all instances of a pattern in a text

gsub("p", "b", text) # replace all instances of …

more ...

Tokenize String

Import packages

library(tidytext)
library(dplyr)

Create data for analysis

text <- "Dplyr provides the ability to process and wrangle data, facilitating convenient data transformations through functions like arrange, select and mutate."
data <- data.frame(count = 5, text)
data$text <- as.character(data$text)

Tokenize text

tokenize <- data %>% unnest_tokens(word, text …

more ...