Modelezés (Modeling)
Keresztnév gyakoriság elemzése:
# csak egyszer kell
#install.packages(c("ggplot2", "hexbin", "maps", "mapproj","RColorBrewer", "scales"))
library(dplyr)
library(ggplot2)
library(reshape)
#----
options(stringsAsFactors = FALSE)
bnames <- read.csv("data/bnames.csv.bz2")
births <- read.csv("data/births.csv")
head(bnames) ## elso sorok megtekintése minta jelleggel
tail(bnames) ## utolso sorok megtekintése minta jelleggel
Charles <- bnames[bnames$name == "Charles", ] ## Charles név adatainak kiszűrése
qplot(year, prop, data = Charles, geom = "line") ## megjelenités
John <- bnames[bnames$name == "John", ] ## John név adatainak kiszűrése
qplot(year, prop, data = John, geom = "line") ## megjelenités
Iyana <- bnames[bnames$name == "Iyana", ] ## Iyana név adatainak kiszűrése
qplot(year, prop, data = Iyana, geom = "line") ## megjelenités
michael <- bnames[bnames$name == "Michael", ]
qplot(year, prop, data = michael, geom = "line")
qplot(year, prop, data = michael, geom = "point")
qplot(year, prop, data = michael, geom = "line",
color = sex)
michaels <- bnames[bnames$name == "Michael" |
bnames$name == "Michelle", ]
qplot(year, prop, data = michaels, geom = "line",
color = interaction(sex, name))
Bűntény elemzés:
options(stringsAsFactors = FALSE)
wages <- read.csv("data/wages.csv")
..
crime <- read.csv("data/crime.csv")
..
tbl_df(crime)
mod <- lm(tc2009 ~ low, data = crime)
mod
names(mod)
summary(mod)
predict(mod)
resid(mod)
coef(mod) ## coefficients(mod)
qplot(low, predict(mod), data = crime, geom = "line")
qplot(low, tc2009, data = crime) + geom_smooth(method = lm)
Adatforras : Link1 Link2 Link3
Lineáris regresszió (Linear regression)
##---- lineális regreszio
library(ggplot2)
library(dplyr)
options(stringsAsFactors = FALSE)
wages <- read.csv("data/wages.csv")
crime <- read.csv("data/crime.csv")
# Estimating a function
mod <- lm(tc2009 ~ low, data = crime)
tc2009 ~ low
class(tc2009 ~ low)
mod
names(mod)
summary(mod)
predict(mod)
resid(mod)
coef(mod)
coefficients(mod)
qplot(low, predict(mod), data = crime, geom = "line")
qplot(low, tc2009, data = crime) + geom_smooth(method = lm)
qplot(low, tc2009, data = crime) +
geom_smooth(se = FALSE, method = lm)
lm(tc2009 ~ 1 + low, data = crime)
lm(tc2009 ~ low, data = crime)
lm(tc2009 ~ low - 1, data = crime)
lm(tc2009 ~ 0 + low, data = crime)
hmod <- lm(earn ~ height, data = wages)
coef(hmod)
qplot(height, earn, data = wages, alpha = I(1/4)) +
geom_smooth(se = FALSE, method = lm) + theme_bw()
earn ~ heights
mod <- lm(earn ~ heights, data = wages)
summary(mod)
plot(mod)
library(ggplot2)
library(dplyr)
options(stringsAsFactors = FALSE)
wages <- read.csv("data/wages.csv")
crime <- read.csv("data/crime.csv")
# Estimating a function
mod <- lm(tc2009 ~ low, data = crime)
tc2009 ~ low
class(tc2009 ~ low)
mod
names(mod)
summary(mod)
predict(mod)
resid(mod)
coef(mod)
coefficients(mod)
qplot(low, predict(mod), data = crime, geom = "line")
qplot(low, tc2009, data = crime) + geom_smooth(method = lm)
qplot(low, tc2009, data = crime) +
geom_smooth(se = FALSE, method = lm)
lm(tc2009 ~ 1 + low, data = crime)
lm(tc2009 ~ low, data = crime)
lm(tc2009 ~ low - 1, data = crime)
lm(tc2009 ~ 0 + low, data = crime)
hmod <- lm(earn ~ height, data = wages)
coef(hmod)
qplot(height, earn, data = wages, alpha = I(1/4)) +
geom_smooth(se = FALSE, method = lm) + theme_bw()
earn ~ heights
mod <- lm(earn ~ heights, data = wages)
summary(mod)
plot(mod)
Keressük az az egyes függvényt amelyre a adatsor legjobban illeszkedik:
Megjegyzések
Megjegyzés küldése