Bigram Analysis


For this bigram analysis, The Time Machine, a science fiction piece by H.G. Wells, was analyzed from the Project Gutenberg, which offers over 56,000 free e-books. Package gutenbergr downloads and processes public domain works in the Project Gutenberg. All other works from The Project Gutenberg can be retrieved …
more ...


Word Cloud


For this word cloud analysis, Great Expectations by Charles Dickens was studied. Great Expectations charts Pip, an orphan's, personal development, exploring universal themes like guilt, persistence and social advancement and historical constructs like wealth, poverty, morality, good versus evil, and Victorian social structures.
This post will provide a brief bigram …
more ...

Convert String to Upper and Lower Case


Convert String to Upper Case

string <- "lower case"
up <- toupper(string)
up
## [1] "LOWER CASE"


Convert String to Lower Case

string <- "UPPER CASE"
low <- tolower(string)
low
## [1] "upper case"
more ...

Split by Character/Separator


Creating a sample dataframe

LatLong <- c("40.841885, -73.856621",
             "40.675026, -73.944855", 
             "40.726253, -73.806710",
             "40.725375, -73.789845", 
             "40.845456, -73.876555")
Location <- c("Bronx", "Brooklyn", 
              "Manhattan", "Queens", "Staten Island")
geoData <- data.frame(LatLong, Location)
geoData
##                 LatLong      Location
## 1 40.841885, -73.856621         Bronx
## 2 40 …
more ...

Substitute a Pattern in String


Substitute first instance of a pattern in a text

text = "Apples and oranges are fruits"
sub("p", "b", text) # replace first instance of letter p with b
## [1] "Abples and oranges are fruits"


Substitute all instances of a pattern in a text

gsub("p", "b", text) # replace all instances of …
more ...

Tokenize String


Import packages

library(tidytext)
library(dplyr)


Create data for analysis

text <- "Dplyr provides the ability to process and wrangle data, facilitating convenient data transformations through functions like arrange, select and mutate."
data <- data.frame(count = 5, text)
data$text <- as.character(data$text)


Tokenize text

tokenize <- data %>% unnest_tokens(word, text …
more ...