Question

以下是我的数据集str。

'data.frame':   9995 obs. of  10 variables:
 $ Count           : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Gates    : Factor w/ 5 levels "B6","B9","I1",..: 3 3 4 4 3 4 4 4 4 4 ...
 $ Entry_Date           : Date, format: "0006-10-20" "0006-10-20" "0006-10-20" ...
 $ Entry_Time           : Factor w/ 950 levels "00:01:00","00:04:00",..: 347 366 450 550 563 700 701 350 460 506 ...
 $ Exit_Date          : Date, format: "0006-10-20" "0006-10-20" "0006-10-20" ...
 $ Exit_Time          : Factor w/ 1012 levels "00:00:00","00:01:00",..: 618 556 637 694 770 936 948 590 640 655 ...
 $ Type_of_entry    : Factor w/ 3 levels "Manual","Pass",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ weekday     : Factor w/ 7 levels "Friday","Monday",..: 2 2 2 2 2 2 2 6 6 6 ...
 $ Ticket.Loss: Factor w/ 2 levels "N","Y": 1 1 1 1 1 2 2 1 1 1 ...
 $ Duration  : Factor w/ 501 levels "00:01:00","00:02:00",..: 223 142 139 96 159 188 199 192 132 101 ...

我正在使用以下功能：

W <- aggregate(Duration ~ Gates, data=parking, FUN=mean)

但是低于错误：

警告消息：1：在mean.default（X [[i]]，...）中：参数不是数字或逻辑：返回NA

Answer 1

Duration是字符串的一个因子，看起来像持续时间，“00:01:00”等。

chron包适用于此类字符串。

library(chron)
aggregate(chron(times=Duration) ~ Gates, data=parking, FUN=mean)

这将给出Gates中每个级别的平均时间。

另见convert character to time in R

Answer 2

如果OP的数据集是实时时间列，我们可以使用as.POSIXct将其转换为＆＃39; DateTime＆＃39;类

parking$Duration <- as.POSIXct(parking$Duration, format = "%H:%M:%S")
transform(aggregate(Duration ~ Gates, data = parking, FUN = mean), 
                               Duration = sub("\\S+\\s+", "", Duration))
#  Gates Duration
#1    B6 11:08:34
#2    B9 11:07:31
#3    I1 11:07:10

注意：没有使用外部包。

数据

set.seed(24)
parking <- data.frame(Gates = sample(c("B6", "B9", "I1"), 20, replace=TRUE),
  Duration = format(seq(Sys.time(), length.out=20, by = "1 min") , "%H:%M:%S"))

聚合因子变量参数时出错不是数字或逻辑

2 个答案:

数据