
时间:2017-11-14 11:29:58

标签: r dataframe filtering

我必须根据存储在4936大小的数据框(A)中的数据集来计算系数。 x 1025 var。


#        V1   V2   V3   V4
# [1,] 26.4 26.5 26.6 26.7
# [2,]  -15   -5    2    3
# [3,]    6   -7    5    8
# [4,]    9    4    4   -2


#      time
# [1,] 26.4
# [2,] 26.6
# [3,] 26.5




#       Sum
# [1,]  -15
# [2,]   13
# [3,]    6


3 个答案:

答案 0 :(得分:4)


# Time from original table
foo <- df1[1, ]
# Time from table B
time <- c(26.4, 26.6, 26.5)

# Remove time row from original table
df1 <- df1[-1, ]

# Iterate over and select columns with foo >= time
sapply(1:length(time), function(x)
    sum(df1[x, which(foo >= time[x])])

# [1] -15  13   6

答案 1 :(得分:2)

I came across this already answered question and felt urged to propose an alternative solution.

  • Reading the title immediately made me think of join or merge.
  • The OP claims to use data frames but the printed output seems to originate from matrices.
  • The data is stored transposed: The time series are stored row-wise horizontally where the first row contains no observations but the time in seconds. This is considered untidy.

None of the other answers bothered to question these oddities although they made the proposed solutions more complex.

Reshaping the data

As a wild guess, the data seem to be collected in an Excel sheet. However, for an efficient processing we need the data to be stored column-wise and preferably in long format:

long <- as.data.table(t(A))[
  , setnames(.SD, "V1", "time")][
    , melt(.SD, id.vars = "time", variable.name = "site_id")][
      , site_id := as.integer(site_id)][]

    time site_id value
 1: 26.4       1   -15
 2: 26.5       1    -5
 3: 26.6       1     2
 4: 26.7       1     3
 5: 26.4       2     6
 6: 26.5       2    -7
 7: 26.6       2     5
 8: 26.7       2     8
 9: 26.4       3     9
10: 26.5       3     4
11: 26.6       3     4
12: 26.7       3    -2

Aggregating in a non-equi join

Now, the OP has requested to aggregate the observations for each site but only observations above a specific time need to be included. A data frame B with the starting times for each site is supplied.

The observations in long can be combined with the starting times in B as follows:

B <- data.table(
  site_id = 1:3,
  time = c(26.4, 26.6, 26.5))

   site_id time
1:       1 26.4
2:       2 26.6
3:       3 26.5
# aggregating in a non-equi join grouped by the join conditions
long[B, on = .(site_id, time >= time), by = .EACHI, sum(value)] 
   site_id time  V1
1:       1 26.4 -15
2:       2 26.6  13
3:       3 26.5   6

Edit: Limit the number of observations in the aggregation

The OP has asked in a comment and in another question how to limit the number of observations to be aggregated after the starting time. This can be achieved by a slight modification:

max_values <- 2L
long[B, on = .(site_id, time >= time), by = .EACHI, sum(value[1:max_values])]  
   site_id time  V1
1:       1 26.4 -20
2:       2 26.6  13
3:       3 26.5   8

Note that max_values is set to 2L here for illustration.

答案 2 :(得分:0)


# recreate your data
V1 <- c(26.4, -15, 6, 9)
V2 <- c(26.5, -5, -7, 4)
V3 <- c(26.6, 2, 5, 4)
V4 <- c(26.7, 3, 8, -2)

A <- data.frame(V1, V2, V3, V4)
B <- data.frame(time = c(26.4, 26.6, 26.5))

#initialize empty variable to store sums in
sum_frame <- numeric()

# calculating sums
for (i in 1:NROW(B)) {
  sum_frame[i] <- sum(A[(i + 1), (which(A[1, ] == B$time[i])):NCOL(A)])

# turning sum-vector into a dataframe
sum_frame <- data.frame(sums = sum_frame)


> sum_frame
1       -15
2        13
3         6