我必须根据存储在4936大小的数据框(A)中的数据集来计算系数。 x 1025 var。
在第一行[1]中,显示以秒为单位的时间,每行是从不同位置收集的样本。数据框A的样本:
# V1 V2 V3 V4
# [1,] 26.4 26.5 26.6 26.7
# [2,] -15 -5 2 3
# [3,] 6 -7 5 8
# [4,] 9 4 4 -2
在另一个数据框(B)中,我存储了我应该开始计算A中每行的时间。数据框B的一个例子:
# time
# [1,] 26.4
# [2,] 26.6
# [3,] 26.5
让我们简化系数是在一个地方(数据框A)收集的数据的总和,这取决于它们收集的时间(数据框B)。对于上面的示例,计算应该如下:
sum1=-15+(-5)+2+3
sum2=5+8
sum3=4+4+(-2)
我想在新数据框中存储的计算结果,如下所示:
# Sum
# [1,] -15
# [2,] 13
# [3,] 6
如何根据存储在第二个数据帧中的值链接两个数据帧之间的计算?
答案 0 :(得分:4)
使用sapply
根据收集时间迭代并选择列的解决方案:
# Time from original table
foo <- df1[1, ]
# Time from table B
time <- c(26.4, 26.6, 26.5)
# Remove time row from original table
df1 <- df1[-1, ]
# Iterate over and select columns with foo >= time
sapply(1:length(time), function(x)
sum(df1[x, which(foo >= time[x])])
)
# [1] -15 13 6
答案 1 :(得分:2)
I came across this already answered question and felt urged to propose an alternative solution.
None of the other answers bothered to question these oddities although they made the proposed solutions more complex.
As a wild guess, the data seem to be collected in an Excel sheet. However, for an efficient processing we need the data to be stored column-wise
and preferably in long format:
library(data.table)
long <- as.data.table(t(A))[
, setnames(.SD, "V1", "time")][
, melt(.SD, id.vars = "time", variable.name = "site_id")][
, site_id := as.integer(site_id)][]
long
time site_id value 1: 26.4 1 -15 2: 26.5 1 -5 3: 26.6 1 2 4: 26.7 1 3 5: 26.4 2 6 6: 26.5 2 -7 7: 26.6 2 5 8: 26.7 2 8 9: 26.4 3 9 10: 26.5 3 4 11: 26.6 3 4 12: 26.7 3 -2
Now, the OP has requested to aggregate the observations for each site but only observations above a specific time
need to be included. A data frame B
with the starting times for each site is supplied.
The observations in long
can be combined with the starting times in B
as follows:
B <- data.table(
site_id = 1:3,
time = c(26.4, 26.6, 26.5))
B
site_id time 1: 1 26.4 2: 2 26.6 3: 3 26.5
# aggregating in a non-equi join grouped by the join conditions
long[B, on = .(site_id, time >= time), by = .EACHI, sum(value)]
site_id time V1 1: 1 26.4 -15 2: 2 26.6 13 3: 3 26.5 6
The OP has asked in a comment and in another question how to limit the number of observations to be aggregated after the starting time. This can be achieved by a slight modification:
max_values <- 2L
long[B, on = .(site_id, time >= time), by = .EACHI, sum(value[1:max_values])]
site_id time V1 1: 1 26.4 -20 2: 2 26.6 13 3: 3 26.5 8
Note that max_values
is set to 2L
here for illustration.
答案 2 :(得分:0)
使用简单的for
循环解决方案:
# recreate your data
V1 <- c(26.4, -15, 6, 9)
V2 <- c(26.5, -5, -7, 4)
V3 <- c(26.6, 2, 5, 4)
V4 <- c(26.7, 3, 8, -2)
A <- data.frame(V1, V2, V3, V4)
B <- data.frame(time = c(26.4, 26.6, 26.5))
#initialize empty variable to store sums in
sum_frame <- numeric()
# calculating sums
for (i in 1:NROW(B)) {
sum_frame[i] <- sum(A[(i + 1), (which(A[1, ] == B$time[i])):NCOL(A)])
}
# turning sum-vector into a dataframe
sum_frame <- data.frame(sums = sum_frame)
输出:
> sum_frame
sum_frame
1 -15
2 13
3 6