Question

我希望能够写一个更简洁的方法来执行以下操作：

我有一个data.frame P（5000rows x 4cols），并希望在第1列中的时间戳落入由向量TimeStamp（in）向量确定的设定范围时，在列2,3和4中找到中值。秒）。

dput(TimeStamp)
c(18, 138, 438, 678, 798, 1278, 1578, 1878, 2178)


dput(head(P))
structure(list(Time = c(0, 5, 100, 200, 500, 1200), SkinTemp = c(27.781, 
27.78, 27.779, 27.779, 27.778, 27.777), HeartRate = c(70, 70, 
70, 70, 70, 70), RespirationRate = c(10, 10, 10, 10, 10, 10)), .Names = c("Time", 
"SkinTemp", "HeartRate", "RespirationRate"), row.names = c(NA, 
6L), class = "data.frame")

e.g。

for x<i<y in P[,1]
     find median of all values in P[,2], P[,3] and P[,4]
     Put median values into a new matrix with headers SkinTemp, HeartRate and RespirationRate
end

Answer 1

您可以尝试：

aggregate(P[,-1],list(Time=findInterval(P$Time,TimeStamp)),m‌edian)  
#  Time SkinTemp HeartRate RespirationRate
#1    0  27.7805        70              10
#2    1  27.7790        70              10
#3    2  27.7790        70              10
#4    3  27.7780        70              10
#5    5  27.7770        70              10

您希望根据它们所处的时间间隔来划分Time值。有一个R函数执行此操作：findInterval。因此，我们计算每个Time值的间隔，然后aggregate计算其他列的值，并计算median。

Answer 2

另一种选择是使用cut函数

P$new <- cut(P$Time, breaks = c(-Inf, TimeStamp, Inf))
aggregate(. ~ new, P, median)

#             new   Time SkinTemp HeartRate RespirationRate
#1      (-Inf,18]    2.5  27.7805        70              10
#2       (18,138]  100.0  27.7790        70              10
#3      (138,438]  200.0  27.7790        70              10
#4      (438,678]  500.0  27.7780        70              10
#5 (798,1.28e+03] 1200.0  27.7770        70              10

矢量版的for for / while循环

2 个答案: