我希望能够写一个更简洁的方法来执行以下操作:
我有一个data.frame P(5000rows x 4cols),并希望在第1列中的时间戳落入由向量TimeStamp(in)向量确定的设定范围时,在列2,3和4中找到中值。秒)。
dput(TimeStamp)
c(18, 138, 438, 678, 798, 1278, 1578, 1878, 2178)
dput(head(P))
structure(list(Time = c(0, 5, 100, 200, 500, 1200), SkinTemp = c(27.781,
27.78, 27.779, 27.779, 27.778, 27.777), HeartRate = c(70, 70,
70, 70, 70, 70), RespirationRate = c(10, 10, 10, 10, 10, 10)), .Names = c("Time",
"SkinTemp", "HeartRate", "RespirationRate"), row.names = c(NA,
6L), class = "data.frame")
e.g。
for x<i<y in P[,1]
find median of all values in P[,2], P[,3] and P[,4]
Put median values into a new matrix with headers SkinTemp, HeartRate and RespirationRate
end
答案 0 :(得分:4)
您可以尝试:
aggregate(P[,-1],list(Time=findInterval(P$Time,TimeStamp)),median)
# Time SkinTemp HeartRate RespirationRate
#1 0 27.7805 70 10
#2 1 27.7790 70 10
#3 2 27.7790 70 10
#4 3 27.7780 70 10
#5 5 27.7770 70 10
您希望根据它们所处的时间间隔来划分Time
值。有一个R
函数执行此操作:findInterval
。因此,我们计算每个Time
值的间隔,然后aggregate
计算其他列的值,并计算median
。
答案 1 :(得分:2)
另一种选择是使用cut
函数
P$new <- cut(P$Time, breaks = c(-Inf, TimeStamp, Inf))
aggregate(. ~ new, P, median)
# new Time SkinTemp HeartRate RespirationRate
#1 (-Inf,18] 2.5 27.7805 70 10
#2 (18,138] 100.0 27.7790 70 10
#3 (138,438] 200.0 27.7790 70 10
#4 (438,678] 500.0 27.7780 70 10
#5 (798,1.28e+03] 1200.0 27.7770 70 10