Question

我有以下两个数据框：

df1 <- data.frame(month=c("1","1","1","1","2","2","2","3","3","3","3","3"),
             temp=c("10","15","16","25","13","17","20","5","16","25","30","37"))


df2 <-  data.frame(period=c("1","1","1","1","1","1","1","1","2","2","2","2","2","2","3","3","3","3","3","3","3","3","3","3","3","3"),
              max_temp=c("9","13","16","18","30","37","38","39","10","15","16","25","30","32","8","10","12","14","16","18","19","25","28","30","35","40"),
              group=c("1","1","1","2","2","2","3","3","3","3","4","4","5","5","5","5","5","6","6","6","7","7","7","7","8","8"))

我想：

对每一行连续检查month中df1列中的值是否与period，{em> df2列中的值相匹配，即 df1$month == df2$period。
如果步骤1不为真，即 df1$month != df2$period，则重复步骤1并将df1中的值与{的下一行中的值进行比较{1}}，等等，直到df2。
如果df1$month == df2$period，请检查df1$month == df2$period的{{1}}列中的值是否小于或等于temp列df1中的值}， ie max_temp。
如果df2，请在df1$temp <= df$max_temp中的df1$temp <= df$max_temp列的该行中返回值，并将此值添加到group，在名为{{的新列中1}}。
如果步骤3不为真，即 df2，则返回步骤1，将df1中的同一行与"new_group"中的下一行进行比较{1}}。

输出数据框I的例子是：

df1$temp > df$max_temp

我一直在玩df1功能，需要一些帮助或重新指导。谢谢！

Answer 1

我发现计算new_group的程序很难按照说明进行操作。据我了解，您正在尝试在new_group中创建名为df1的变量。对于i的行df1，new_group值是group中第一行的df2值：

已编入索引i或更高
period值与df1$month[i]
max_temp值不低于df1$temp[i]

我通过使用sapply的行索引调用df1来解决这个问题：

fxn = function(idx) {
  # Potentially matching indices in df2
  pm = idx:nrow(df2)

  # Matching indices in df2
  m = pm[df2$period[pm] == df1$month[idx] &
         as.numeric(as.character(df1$temp[idx])) <=
         as.numeric(as.character(df2$max_temp[pm]))]

  # Return the group associated with the first matching index
  return(df2$group[m[1]])
}
df1$new_group = sapply(seq(nrow(df1)), fxn)
df1
#    month temp new_group
# 1      1   10         1
# 2      1   15         1
# 3      1   16         1
# 4      1   25         2
# 5      2   13         3
# 6      2   17         4
# 7      2   20         4
# 8      3    5         5
# 9      3   16         6
# 10     3   25         7
# 11     3   30         7
# 12     3   37         8

Answer 2

library(data.table)
dt1 <- data.table(df1, key="month")
dt2 <- data.table(df2, key="period")

## add a row index
dt1[, rn1 := seq(nrow(dt1))]

dt3 <- 
unique(dt1[dt2, allow.cartesian=TRUE][, new_group := group[min(which(temp <= max_temp))], by="rn1"], by="rn1")

## Keep only the columns you want
dt3[, c("month", "temp", "max_temp", "new_group"), with=FALSE]

    month temp max_temp new_group
 1:     1    1       19         1
 2:     1    3       19         1
 3:     1    4       19         1
 4:     1    7       19         1
 5:     2    2        1         3
 6:     2    5        1         3
 7:     2    6        1         4
 8:     3   10       18         5
 9:     3    4       18         5
10:     3    7       18         5
11:     3    8       18         5
12:     3    9       18         5

R - 在两个数据帧中连续比较行并返回一个值

2 个答案: