我遇到了转换因素到目前为止的问题;它正在制作我不想要的NA值。
我的问题的数据可以在这里找到:(https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip)
x <- read.csv("activity.csv")
head(x)
steps date interval
1 NA 2012-10-01 0
2 NA 2012-10-01 5
3 NA 2012-10-01 10
4 NA 2012-10-01 15
5 NA 2012-10-01 20
6 NA 2012-10-01 25
目标:我试图找到每天所采取的平均步数。首先,我需要 bin值,以便每个数据点对应于给定日期的总和
x$Day <- as.Date(cut(x$date, breaks = "day"))
Error in cut.default(x$date, breaks = "day") : 'x' must be numeric
只需使用 class 函数
确认class(x[,2])
&#34;因子&#34;
这很奇怪,因为从上面的头部(x)看起来就像是Date。无论如何,为了使每个数据点对应于给定日期的总和,使用 cut 函数,我需要将日期更改为&#34; Date&#34;类
x[,2] <- as.Date(x[,2], format="%Y/%m/%d")
class(x[,2])
[1]&#34;日期&#34;
好的,所以从理论上说我应该能够现在分区值
x$Day <- as.Date(cut(x$date, breaks = "day"))
seq.int(0,to0 - from,by)中的错误:&#39;到&#39;不能是NA,NaN或无限 另外:警告信息: 1:在min.default中(c(NA_real_,NA_real_,NA_real_,NA_real_,NA_real_,: min没有非缺失的参数;返回Inf 2:在max.default中(c(NA_real_,NA_real_,NA_real_,NA_real_,NA_real_,: max没有非缺失的参数;返回-Inf
head(is.na(x))
steps date interval
[1,] TRUE TRUE FALSE
[2,] TRUE TRUE FALSE
[3,] TRUE TRUE FALSE
[4,] TRUE TRUE FALSE
[5,] TRUE TRUE FALSE
[6,] TRUE TRUE FALSE
如果我将此与x[,2] <- as.Date(x[,2], format="%Y/%m/%d")
head(is.na(x))
steps date interval
[1,] TRUE FALSE FALSE
[2,] TRUE FALSE FALSE
[3,] TRUE FALSE FALSE
[4,] TRUE FALSE FALSE
[5,] TRUE FALSE FALSE
[6,] TRUE FALSE FALSE
不确定这里发生了什么?我知道这应该有用,因为我从以下教程(http://blog.mollietaylor.com/2013/08/plot-weekly-or-monthly-totals-in-r.html?m=1)
中得到了这个想法sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_Canada.1252
[2] LC_CTYPE=English_Canada.1252
[3] LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices utils
[5] datasets methods base
other attached packages:
[1] scales_0.2.4 ggplot2_1.0.0
loaded via a namespace (and not attached):
[1] colorspace_1.2-4 digest_0.6.4
[3] grid_3.0.3 gtable_0.1.2
[5] MASS_7.3-29 munsell_0.4.2
[7] plyr_1.8.1 proto_0.3-10
[9] Rcpp_0.11.1 reshape2_1.4
[11] stringr_0.6.2 tools_3.0.3
答案 0 :(得分:1)
只是为了说明,这些都导致相同的输出(当然除了日期列的类):
x <- read.csv("~/Downloads/activity.csv")
# Date is a factor
r1 <- aggregate(steps~date,data = x,FUN = mean)
x1 <- read.csv("~/Downloads/activity.csv",stringsAsFactors = FALSE)
# Date is a character
r2 <- aggregate(steps~date,data = x1,FUN = mean)
x2 <- x
x2$date <- as.Date(as.character(x$date))
# Date is a date
r3 <- aggregate(steps~date,data = x2,FUN = mean)
答案 1 :(得分:0)
my_data <-
read.csv(your_file, stringsAsFactors = FALSE)
# Convert 'my_data$date' to Date format
my_data$date <-
as.Date(my_data$date)
这应该有用......