在数据框中创建一个新变量,条件超过另一个数据帧

时间:2017-10-24 08:18:55

标签: r dataframe data.table sqldf

我有2个像这样的数据框

DF1

       date item 
 02/01/2017    A 
 09/01/2017    B
 14/01/2017    C

DF2

      date1       date2  item    prm
 01/01/2017  03/01/2017     A    YES
 08/01/2017  10/01/2017     B    YES
 15/01/2017  17/01/2017     C    YES

目的

prm变量是一个常量变量,它只有1个值。 我想在我的df1中使用此条件添加变量prm

df1$date is between df2$date1 and df2$date2 and df1$item=df2$item

但是,如果条件不匹配,那么我需要prm获取值“NO”

4 个答案:

答案 0 :(得分:2)

您可以在此使用ifelse

 df1 <- read.table(text = "      date item 
 02/01/2017    A 
 09/01/2017    B
 16/01/2017    C", header = T)

df2 <- read.table(text = "      date1       date2  item
 01/01/2017  03/01/2017     A 
                  08/01/2017  10/01/2017     B
                  15/01/2017  17/01/2017     C", header = T)

df1$date <- as.Date(df1$date, format = "%d/%m/%Y")
df2$date1 <- as.Date(df2$date1, format = "%d/%m/%Y")
df2$date2 <- as.Date(df2$date2, format = "%d/%m/%Y")


df1$prm <- ifelse(df1$date >= df2$date1 & df1$date <= df2$date2 & df1$item == df2$item, "YES" , "NO")

        date item prm
1 0002-01-20    A YES
2 0009-01-20    B YES
3 0016-01-20    C YES

答案 1 :(得分:2)

这是使用dplyr的解决方案:

library(tidyverse)

df1 = tribble(~date, ~item,
             "02/01/2017",    "A",
             "09/01/2017",    "B",
             "16/01/2017",    "C")

df2 = tribble(~date1, ~date2, ~item,
"01/01/2017",  "03/01/2017",     "A",
"08/01/2017",  "10/01/2017",     "B",
"15/01/2017",  "15/01/2017",     "C")

df3 = merge(x = df1, y = df2)


df4 = as.data.frame(cbind(df3[1], lapply(df3[2:4], as.Date, format = "%d/%m/%Y")))


df5 <- df4 %>%
  mutate(prm = if_else((date > date1) & (date < date2), "YES", "NO"))

df5

答案 2 :(得分:1)

[编辑]

如果while Value == None or Value.isdigit() == False: try: Value = str(input(Message)).strip() except InputError: Value = None 中的行数不同,您可以使用df1并在df2和{{1}上创建sqldf并使用LEFT JOIN语句创建列df1.date between df2.date1 and df2.date2

df1.item = df2.item

答案 3 :(得分:1)

使用非等联接更新加入data.table可以使用

library(data.table)
setDT(df1)[setDT(df2), on = .(item, date>=date1, date<= date2), prm := i.prm][
  is.na(prm), prm := "NO"]
df1
         date item prm
1: 2017-01-02    A YES
2: 2017-01-09    B YES
3: 2017-01-14    C  NO