我尝试按数据框中的日期进行子集化,并计算每个子集的大小平均值列。
http://i.gyazo.com/27df6d87ca9222c7c982983661b770f9.png
为此,我使用了包plyr中的ddply函数:
library('plyr')
ddply(df, .(date), summarize, mean =mean(size))
但我收到以下错误:
Error: 'names' attribute [11] must be the same length as the vector [1]
任何人都可以帮助我?
谢谢!
编辑: 对不起!我在R
中很新 date host petition resource protocol result size date2
1995-07-01 06:00:01 199.72.81.55 GET /history/apollo/ HTTP/1.0 200 6245 1995-07-01
1995-07-01 06:00:06 unicomp6.unicomp.net GET /shuttle/countdown/ HTTP/1.0 200 3985 1995-07-01
1995-07-01 06:00:09 199.120.110.21 GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0 200 4085 1995-07-01
1995-07-01 06:00:11 burger.letters.com GET /shuttle/countdown/liftoff.html HTTP/1.0 304 0 1995-07-01
1995-07-01 06:00:11 199.120.110.21 GET /shuttle/missions/sts-73/sts-73-patch-small.gif HTTP/1.0 200 4179 1995-07-01
1995-07-01 06:00:12 burger.letters.com GET /images/NASA-logosmall.gif HTTP/1.0 304 0 1995-07-01
df <- read.table(header = TRUE, text = "date host petition resource protocol result size date2
'1995-07-01 06:00:01' 199.72.81.55 GET /history/apollo/ HTTP/1.0 200 6245 1995-07-01
'1995-07-01 06:00:06' unicomp6.unicomp.net GET /shuttle/countdown/ HTTP/1.0 200 3985 1995-07-01
'1995-07-01 06:00:09' 199.120.110.21 GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0 200 4085 1995-07-01
'1995-07-01 06:00:11' burger.letters.com GET /shuttle/countdown/liftoff.html HTTP/1.0 304 0 1995-07-01
'1995-07-01 06:00:11' 199.120.110.21 GET /shuttle/missions/sts-73/sts-73-patch-small.gif HTTP/1.0 200 4179 1995-07-01
'1995-07-01 06:00:12' burger.letters.com GET /images/NASA-logosmall.gif HTTP/1.0 304 0 1995-07-01",
colClasses = c('POSIXct','character','character','character',
'character','numeric','numeric','Date'))
我也试过转换为角色,然后应用ddply
tmp <- df$date2
df$date2 <- as.character(df$date2)
class(df$date2)
[1] "character"
mean_con <- ddply(df, .(date2), summarize, mean = mean(size))
Error: 'names' attribute [11] must be the same length as the vector [1]
这是str(df):
> str(df)
'data.frame': 1891715 obs. of 8 variables:
$ date : POSIXlt, format: "1995-07-01 06:00:01" "1995-07-01 06:00:06" "1995-07-01 06:00:09" ...
$ host : chr "199.72.81.55" "unicomp6.unicomp.net" "199.120.110.21" "burger.letters.com" ...
$ petition: chr "GET" "GET" "GET" "GET" ...
$ resource: chr "/history/apollo/" "/shuttle/countdown/" "/shuttle/missions/sts-73/mission-sts-73.html" "/shuttle/countdown/liftoff.html" ...
$ protocol: chr "HTTP/1.0" "HTTP/1.0" "HTTP/1.0" "HTTP/1.0" ...
$ result : Factor w/ 9 levels "","200","302",..: 2 2 2 4 2 4 2 2 2 2 ...
$ size : int 6245 3985 4085 0 4179 0 0 3985 3985 7074 ...
$ date2 : chr "1995-07-01" "1995-07-01" "1995-07-01" "1995-07-01" ...
>
所以date2是chr ......
好的,我重新启动Rstudio,现在它正在工作......对所有人来说!