R按日期子集并计算平均值

时间:2015-04-21 17:30:06

标签: r

我尝试按数据框中的日期进行子集化,并计算每个子集的大小平均值列。

http://i.gyazo.com/27df6d87ca9222c7c982983661b770f9.png

为此,我使用了包plyr中的ddply函数:

library('plyr')
ddply(df, .(date), summarize,  mean =mean(size))

但我收到以下错误:

Error: 'names' attribute [11] must be the same length as the vector [1]

任何人都可以帮助我?

谢谢!

编辑: 对不起!我在R

中很新
               date                 host petition                                        resource protocol result size      date2
1995-07-01 06:00:01         199.72.81.55      GET                                /history/apollo/ HTTP/1.0    200 6245 1995-07-01
1995-07-01 06:00:06 unicomp6.unicomp.net      GET                             /shuttle/countdown/ HTTP/1.0    200 3985 1995-07-01
1995-07-01 06:00:09       199.120.110.21      GET    /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0    200 4085 1995-07-01
1995-07-01 06:00:11   burger.letters.com      GET                 /shuttle/countdown/liftoff.html HTTP/1.0    304    0 1995-07-01  
1995-07-01 06:00:11       199.120.110.21      GET /shuttle/missions/sts-73/sts-73-patch-small.gif HTTP/1.0    200 4179 1995-07-01
1995-07-01 06:00:12   burger.letters.com      GET                      /images/NASA-logosmall.gif HTTP/1.0    304    0 1995-07-01

df <- read.table(header = TRUE, text = "date                 host petition                                        resource protocol result size      date2
'1995-07-01 06:00:01'         199.72.81.55      GET                                /history/apollo/ HTTP/1.0    200 6245 1995-07-01
'1995-07-01 06:00:06' unicomp6.unicomp.net      GET                             /shuttle/countdown/ HTTP/1.0    200 3985 1995-07-01
'1995-07-01 06:00:09'       199.120.110.21      GET    /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0    200 4085 1995-07-01
'1995-07-01 06:00:11'   burger.letters.com      GET                 /shuttle/countdown/liftoff.html HTTP/1.0    304    0 1995-07-01  
'1995-07-01 06:00:11'       199.120.110.21      GET /shuttle/missions/sts-73/sts-73-patch-small.gif HTTP/1.0    200 4179 1995-07-01
'1995-07-01 06:00:12'   burger.letters.com      GET                      /images/NASA-logosmall.gif HTTP/1.0    304    0 1995-07-01",
                 colClasses = c('POSIXct','character','character','character',
                                'character','numeric','numeric','Date'))

我也试过转换为角色,然后应用ddply

tmp <- df$date2
df$date2 <- as.character(df$date2)
class(df$date2)
 [1] "character"
mean_con <- ddply(df, .(date2), summarize,  mean = mean(size))
 Error: 'names' attribute [11] must be the same length as the vector [1]

这是str(df):

> str(df)
'data.frame':   1891715 obs. of  8 variables:
 $ date    : POSIXlt, format: "1995-07-01 06:00:01" "1995-07-01 06:00:06"         "1995-07-01 06:00:09" ...
 $ host    : chr  "199.72.81.55" "unicomp6.unicomp.net" "199.120.110.21"    "burger.letters.com" ...
 $ petition: chr  "GET" "GET" "GET" "GET" ...   
 $ resource: chr  "/history/apollo/" "/shuttle/countdown/"  "/shuttle/missions/sts-73/mission-sts-73.html" "/shuttle/countdown/liftoff.html"    ...
 $ protocol: chr  "HTTP/1.0" "HTTP/1.0" "HTTP/1.0" "HTTP/1.0" ...
 $ result  : Factor w/ 9 levels "","200","302",..: 2 2 2 4 2 4 2 2 2 2 ...
 $ size    : int  6245 3985 4085 0 4179 0 0 3985 3985 7074 ...
 $ date2   : chr  "1995-07-01" "1995-07-01" "1995-07-01" "1995-07-01" ...  
 > 

所以date2是chr ......

好的,我重新启动Rstudio,现在它正在工作......对所有人来说!

0 个答案:

没有答案