聚合因子变量参数时出错不是数字或逻辑

时间:2016-06-26 04:44:20

标签: r aggregate

以下是我的数据集str

'data.frame':   9995 obs. of  10 variables:
 $ Count           : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Gates    : Factor w/ 5 levels "B6","B9","I1",..: 3 3 4 4 3 4 4 4 4 4 ...
 $ Entry_Date           : Date, format: "0006-10-20" "0006-10-20" "0006-10-20" ...
 $ Entry_Time           : Factor w/ 950 levels "00:01:00","00:04:00",..: 347 366 450 550 563 700 701 350 460 506 ...
 $ Exit_Date          : Date, format: "0006-10-20" "0006-10-20" "0006-10-20" ...
 $ Exit_Time          : Factor w/ 1012 levels "00:00:00","00:01:00",..: 618 556 637 694 770 936 948 590 640 655 ...
 $ Type_of_entry    : Factor w/ 3 levels "Manual","Pass",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ weekday     : Factor w/ 7 levels "Friday","Monday",..: 2 2 2 2 2 2 2 6 6 6 ...
 $ Ticket.Loss: Factor w/ 2 levels "N","Y": 1 1 1 1 1 2 2 1 1 1 ...
 $ Duration  : Factor w/ 501 levels "00:01:00","00:02:00",..: 223 142 139 96 159 188 199 192 132 101 ...

我正在使用以下功能:

W <- aggregate(Duration ~ Gates, data=parking, FUN=mean)

但是低于错误:

  

警告消息:1:在mean.default(X [[i]],...)中:参数不是   数字或逻辑:返回NA

2 个答案:

答案 0 :(得分:2)

Duration是字符串的一个因子,看起来像持续时间,“00:01:00”等。

chron包适用于此类字符串。

library(chron)
aggregate(chron(times=Duration) ~ Gates, data=parking, FUN=mean)

这将给出Gates中每个级别的平均时间。

另见convert character to time in R

答案 1 :(得分:0)

如果OP的数据集是实时时间列,我们可以使用as.POSIXct将其转换为&#39; DateTime&#39;类

parking$Duration <- as.POSIXct(parking$Duration, format = "%H:%M:%S")
transform(aggregate(Duration ~ Gates, data = parking, FUN = mean), 
                               Duration = sub("\\S+\\s+", "", Duration))
#  Gates Duration
#1    B6 11:08:34
#2    B9 11:07:31
#3    I1 11:07:10

注意:没有使用外部包。

数据

set.seed(24)
parking <- data.frame(Gates = sample(c("B6", "B9", "I1"), 20, replace=TRUE),
  Duration = format(seq(Sys.time(), length.out=20, by = "1 min") , "%H:%M:%S"))