平均值和中位数Vs摘要

时间:2014-10-14 11:48:01

标签: r knitr rstudio mean

我目前正在Coursera上做一个可重复数据课程,其中一个问题是每天步数的平均值和中位数,我有这个但是当我用摘要函数确认它时,Mean和Median的摘要版本是不同的。我是通过knitr

运行的

为什么会这样? **下面是一个编辑,显示到目前为止我的所有脚本,包括原始数据的链接:

##Download the data You have to change https to http to get this to work in knitr

target_url <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip"
target_localfile = "ActivityMonitoringData.zip"
if (!file.exists(target_localfile)) {
  download.file(target_url, destfile = target_localfile) 
}
Unzip the file to the temporary directory

unzip(target_localfile, exdir="extract", overwrite=TRUE)
List the extracted files

list.files("./extract")
## [1] "activity.csv"
Load the extracted data into R

activity.csv <- read.csv("./extract/activity.csv", header = TRUE)
activity1 <- activity.csv[complete.cases(activity.csv),]
str(activity1)
## 'data.frame':    15264 obs. of  3 variables:
##  $ steps   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ date    : Factor w/ 61 levels "2012-10-01","2012-10-02",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ interval: int  0 5 10 15 20 25 30 35 40 45 ...
Use a histogram to view the number of steps taken each day

histData <- aggregate(steps ~ date, data = activity1, sum)
h <- hist(histData$steps,  # Save histogram as object
          breaks = 11,  # "Suggests" 11 bins
          freq = T,
          col = "thistle1", 
          main = "Histogram of Activity",
          xlab = "Number of daily steps")


Obtain the Mean and Median of the daily steps

steps <- histData$steps
mean(steps)
## [1] 10766
median(steps)
## [1] 10765
summary(histData$steps)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      41    8840   10800   10800   13300   21200
summary(steps)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      41    8840   10800   10800   13300   21200
sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: i386-w64-mingw32/i386 (32-bit)
## 
## locale:
## [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
## [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
## [5] LC_TIME=English_Australia.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.6
## 
## loaded via a namespace (and not attached):
## [1] evaluate_0.5.5 formatR_1.0    stringr_0.6.2  tools_3.1.1

1 个答案:

答案 0 :(得分:6)

实际上,答案 正确,你只是打错了。您正在某处设置digits选项。

将这个放在脚本之前:

options(digits=12)

你将拥有:

mean(steps)
# [1] 10766.1886792
median(steps)
# [1] 10765
summary(steps)
#      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
#   41.0000  8841.0000 10765.0000 10766.1887 13294.0000 21194.0000 

请注意summary使用max(3, getOption("digits")-3)来打印多少个数字。所以它稍微圆了一点(10766.1887而不是10766.1886792)。