2+2
## [1] 4
The mean is simply the arithmetic mean: \[ \bar{x} = \frac{1}{n} \sum_{i=1}^nx_i \] where \(n\) is the sample size.
mean(c(38, 100, 64, 43, 63, 59, 107, 52, 86, 77))
## [1] 68.9
setwd("C:/Users/sks/Dropbox/sks/math1024/rmarkdown") # Change this
ffood <- read.csv("servicetime.csv", head=T) # csv stands for comma separated value file.
# head =T says head = TRUe, so the first row of the file contains the column names.
ffood
## AM PM
## 1 38 45
## 2 100 62
## 3 64 52
## 4 43 72
## 5 63 81
## 6 59 88
## 7 107 64
## 8 52 75
## 9 86 59
## 10 77 70
head(ffood) # Gets the head of the daya set
## AM PM
## 1 38 45
## 2 100 62
## 3 64 52
## 4 43 72
## 5 63 81
## 6 59 88
tail(ffood) # Gets the tail
## AM PM
## 5 63 81
## 6 59 88
## 7 107 64
## 8 52 75
## 9 86 59
## 10 77 70
summary(ffood)
## AM PM
## Min. : 38.00 Min. :45.00
## 1st Qu.: 53.75 1st Qu.:59.75
## Median : 63.50 Median :67.00
## Mean : 68.90 Mean :66.80
## 3rd Qu.: 83.75 3rd Qu.:74.25
## Max. :107.00 Max. :88.00
names(ffood) ## Prints the column names of the argument data frame.
## [1] "AM" "PM"
ffood$AM # Prints the AM values
## [1] 38 100 64 43 63 59 107 52 86 77
ffood[,1] # Gets the first column and all rows.
## [1] 38 100 64 43 63 59 107 52 86 77
ffood[1:2, ] ## Gets the first two rows and all columns.
## AM PM
## 1 38 45
## 2 100 62
ffood[1, 2] ## Gets the first row second column entry
## [1] 45
cfail <- scan("compfail.txt")
summary(cfail)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 1.00 3.00 3.75 5.00 17.00
var(cfail)
## [1] 11.43204
table(cfail)
## cfail
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 17
## 12 16 21 12 11 8 7 2 4 2 3 2 2 1 1
hist(cfail)
## The Table command shows that the mode is at 2 failures per week. The histogram shows a very skewed distribution. We can draw the boxplot of the data by issuing the boxplot command.
boxplot(cfail)
The median of the data is the thick black line in the middle, and the first and third quartiles are the sides of the box. The two straightlines at the top and bottom connected by the dashed lines are furthest observations from the median but within 1.5 of the inter-quartile range from the median. Suspected outliers are shown as circles outside the whiskers. The boxplot shows positive skewness to the right as well. Now we go to read and explore the other data sets.
wgain <- read.table("wtgain.txt", head=T)
head(wgain)
## student initial final
## 1 1 77.56423 76.20346
## 2 2 49.89512 50.34871
## 3 3 60.78133 61.68851
## 4 4 52.16308 53.97745
## 5 5 68.03880 70.30676
## 6 6 47.17357 48.08075
summary(wgain)
## student initial final
## Min. : 1.00 Min. :42.64 Min. : 43.54
## 1st Qu.:17.75 1st Qu.:53.86 1st Qu.: 54.32
## Median :34.50 Median :60.78 Median : 60.78
## Mean :34.50 Mean :61.72 Mean : 62.59
## 3rd Qu.:51.25 3rd Qu.:68.04 3rd Qu.: 68.49
## Max. :68.00 Max. :99.79 Max. :101.60
summary(wgain$final-wgain$initial)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.2680 0.4536 0.9072 0.8672 1.3608 3.6287
gain <- wgain$final - wgain$initial
summary(gain)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.2680 0.4536 0.9072 0.8672 1.3608 3.6287
hist(gain)
boxplot(gain)
guess <- read.table("guess.txt", head=TRUE, sep=",")
guess
## group P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 mae gsize sex
## 1 1 14 -6 5 19 0 -5 -9 -1 -7 -7 7.3 3 F
## 2 2 8 0 5 0 5 -1 -8 -1 -8 0 3.6 4 F
## 3 3 6 -5 6 3 1 -8 -18 1 -9 -6 6.3 4 M
## 4 4 10 -7 3 3 2 -2 -13 6 -7 -7 6.0 2 X
## 5 5 11 -3 4 2 -1 0 -17 0 -14 3 5.5 2 F
## 6 6 13 -3 3 5 -2 -8 -9 -1 -7 0 5.1 3 F
## 7 7 9 -4 3 0 4 -13 -15 6 -7 5 6.6 4 M
## 8 8 11 0 2 8 3 3 -15 1 -7 0 5.0 4 M
## 9 9 6 -2 2 8 3 -8 -7 -1 1 -2 4.0 4 F
## 10 10 11 2 3 11 1 -8 -14 -2 -1 0 5.3 4 F
summary(guess)
## group P1 P2 P3
## Min. : 1.00 Min. : 6.00 Min. :-7.00 Min. :2.00
## 1st Qu.: 3.25 1st Qu.: 8.25 1st Qu.:-4.75 1st Qu.:3.00
## Median : 5.50 Median :10.50 Median :-3.00 Median :3.00
## Mean : 5.50 Mean : 9.90 Mean :-2.80 Mean :3.60
## 3rd Qu.: 7.75 3rd Qu.:11.00 3rd Qu.:-0.50 3rd Qu.:4.75
## Max. :10.00 Max. :14.00 Max. : 2.00 Max. :6.00
## P4 P5 P6 P7
## Min. : 0.00 Min. :-2.00 Min. :-13.00 Min. :-18.0
## 1st Qu.: 2.25 1st Qu.: 0.25 1st Qu.: -8.00 1st Qu.:-15.0
## Median : 4.00 Median : 1.50 Median : -6.50 Median :-13.5
## Mean : 5.90 Mean : 1.60 Mean : -5.00 Mean :-12.5
## 3rd Qu.: 8.00 3rd Qu.: 3.00 3rd Qu.: -1.25 3rd Qu.: -9.0
## Max. :19.00 Max. : 5.00 Max. : 3.00 Max. : -7.0
## P8 P9 P10 mae
## Min. :-2.0 Min. :-14.00 Min. :-7.0 Min. :3.600
## 1st Qu.:-1.0 1st Qu.: -7.75 1st Qu.:-5.0 1st Qu.:5.025
## Median :-0.5 Median : -7.00 Median : 0.0 Median :5.400
## Mean : 0.8 Mean : -6.60 Mean :-1.4 Mean :5.470
## 3rd Qu.: 1.0 3rd Qu.: -7.00 3rd Qu.: 0.0 3rd Qu.:6.225
## Max. : 6.0 Max. : 1.00 Max. : 5.0 Max. :7.300
## gsize sex
## Min. :2.0 F :5
## 1st Qu.:3.0 F :1
## Median :4.0 M :3
## Mean :3.4 X :1
## 3rd Qu.:4.0
## Max. :4.0
A <- guess[, c(2:11)] ## This only stores the score columns 2 to 11.
A <- abs(A) ## Calculats the absolute value of each entry.
newmae <- apply(A, 1, FUN=mean) ## Used to get the row means as instructed by the value 1.
# Use ?apply to see what it doe
A ## prints A on screen
## P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
## 1 14 6 5 19 0 5 9 1 7 7
## 2 8 0 5 0 5 1 8 1 8 0
## 3 6 5 6 3 1 8 18 1 9 6
## 4 10 7 3 3 2 2 13 6 7 7
## 5 11 3 4 2 1 0 17 0 14 3
## 6 13 3 3 5 2 8 9 1 7 0
## 7 9 4 3 0 4 13 15 6 7 5
## 8 11 0 2 8 3 3 15 1 7 0
## 9 6 2 2 8 3 8 7 1 1 2
## 10 11 2 3 11 1 8 14 2 1 0
newmae ## prints newmae on screen
## [1] 7.3 3.6 6.3 6.0 5.5 5.1 6.6 5.0 4.0 5.3
newmae - guess$mae # check that all values are zero
## [1] 0 0 0 0 0 0 0 0 0 0