R - 在两个数据帧中连续比较行并返回一个值

时间:2014-01-30 02:57:22

标签: r if-statement compare dataframe

我有以下两个数据框:

df1 <- data.frame(month=c("1","1","1","1","2","2","2","3","3","3","3","3"),
             temp=c("10","15","16","25","13","17","20","5","16","25","30","37"))


df2 <-  data.frame(period=c("1","1","1","1","1","1","1","1","2","2","2","2","2","2","3","3","3","3","3","3","3","3","3","3","3","3"),
              max_temp=c("9","13","16","18","30","37","38","39","10","15","16","25","30","32","8","10","12","14","16","18","19","25","28","30","35","40"),
              group=c("1","1","1","2","2","2","3","3","3","3","4","4","5","5","5","5","5","6","6","6","7","7","7","7","8","8"))

我想:

  1. 对每一行连续检查monthdf1列中的值是否与period,{em> df2列中的值相匹配,即 df1$month == df2$period

  2. 如果步骤1不为真, df1$month != df2$period,则重复步骤1并将df1中的值与{的下一行中的值进行比较{1}},等等,直到df2

  3. 如果df1$month == df2$period,请检查df1$month == df2$period的{​​{1}}列中的值是否小于或等于tempdf1中的值}, ie max_temp

  4. 如果df2,请在df1$temp <= df$max_temp中的df1$temp <= df$max_temp列的该行中返回值,并将此值添加到group,在名为{{的新列中1}}。

  5. 如果步骤3不为真, df2,则返回步骤1,将df1中的同一行与"new_group"中的下一行进行比较{1}}。

  6. 输出数据框I的例子是:

    df1$temp > df$max_temp

    我一直在玩df1功能,需要一些帮助或重新指导。谢谢!

2 个答案:

答案 0 :(得分:1)

我发现计算new_group的程序很难按照说明进行操作。据我了解,您正在尝试在new_group中创建名为df1的变量。对于i的行df1new_group值是group中第一行的df2值:

  1. 已编入索引i或更高
  2. period值与df1$month[i]
  3. 匹配
  4. max_temp值不低于df1$temp[i]
  5. 我通过使用sapply的行索引调用df1来解决这个问题:

    fxn = function(idx) {
      # Potentially matching indices in df2
      pm = idx:nrow(df2)
    
      # Matching indices in df2
      m = pm[df2$period[pm] == df1$month[idx] &
             as.numeric(as.character(df1$temp[idx])) <=
             as.numeric(as.character(df2$max_temp[pm]))]
    
      # Return the group associated with the first matching index
      return(df2$group[m[1]])
    }
    df1$new_group = sapply(seq(nrow(df1)), fxn)
    df1
    #    month temp new_group
    # 1      1   10         1
    # 2      1   15         1
    # 3      1   16         1
    # 4      1   25         2
    # 5      2   13         3
    # 6      2   17         4
    # 7      2   20         4
    # 8      3    5         5
    # 9      3   16         6
    # 10     3   25         7
    # 11     3   30         7
    # 12     3   37         8
    

答案 1 :(得分:1)

library(data.table)
dt1 <- data.table(df1, key="month")
dt2 <- data.table(df2, key="period")

## add a row index
dt1[, rn1 := seq(nrow(dt1))]

dt3 <- 
unique(dt1[dt2, allow.cartesian=TRUE][, new_group := group[min(which(temp <= max_temp))], by="rn1"], by="rn1")

## Keep only the columns you want
dt3[, c("month", "temp", "max_temp", "new_group"), with=FALSE]

    month temp max_temp new_group
 1:     1    1       19         1
 2:     1    3       19         1
 3:     1    4       19         1
 4:     1    7       19         1
 5:     2    2        1         3
 6:     2    5        1         3
 7:     2    6        1         4
 8:     3   10       18         5
 9:     3    4       18         5
10:     3    7       18         5
11:     3    8       18         5
12:     3    9       18         5