Question

所以我有2个数据框df1和df2，在两个数据框中都有两列Curr_Time和Curr_Date。我应该在两个数据框中比较Curr_Time的值，如果值相同则在值不同时不执行任何其他操作，然后将新值附加到df1。

我正在处理流数据，其中df2只有一行具有最新值。我的目的是将df2中的新值附加到df1当且仅当df2$Curr_Time != df1$Curr_Time中的值。目前，无论上述逻辑如何，我都可以将所有值附加到df1。

df2：每个5 seconds

只会更新一行

     Curr_Time        Curr_Date
     11:45:34         10-04-2018

df1：当前为每个5 Seconds追加新行，而不验证导致值冗余的值。

    Curr_Time         Curr_Date
    11:43:34         10-04-2018
    11:43:34         10-04-2018
    11:45:34         10-04-2018
    11:45:34         10-04-2018

df1的预期输出

    Curr_Time       Curr_Date  
    11:43:34        10-04-2018
    11:45:34        10-04-2018

以下是我的R代码。

    library(tcltk2)

    df1 <- data.frame(stringsAsFactors=FALSE)
    df2 <- data.frame(stringsAsFactors=FALSE)

    frameupdate <- function(){
    if (nrow(df1)==0)
     df1 <<- df2
   else
     df1 <<- rbind(df1 , df2)
   }

      tclTaskSchedule(5000, frameupdate(), id = "frameupdate", redo = TRUE)

Answer 1

在if else语句之后，您可以通过简单的验证来完成：

library(dplyr)
df1 %>%
  distinct()

给你：

# A tibble: 2 x 2
  Curr_Time Curr_Date 
   <time>    <chr>     
1 11:43     10-04-2018
2 11:45     10-04-2018

Answer 2

正如@cephalopod所说，anti_join是一个很好的方法。

您想检查df2中的记录是否已包含在df1中。

你可以像@Stephan一样提到，在你追加所有内容而不检查它是否重复之后，运行distinct()以获得不同的记录

或者您可以在函数中每次检查，或使用dplyr的anti_join函数。

以下是dplyr的示例：

首先我假设df1不应包含重复记录（如果逻辑从一开始就是正确的）

df1<-df1 %>% unique()
head(df1)
  Curr_Time  Curr_Date
1  11:43:34 10-04-2018
3  11:45:34 10-04-2018

我创建了另一条记录df2.new作为应添加到df1的新记录的示例：

df2.new
  Curr_Time  Curr_Date
1  11:45:57 10-04-2018

例如：

df2.new %>% anti_join(df1)
Joining, by = c("Curr_Time", "Curr_Date")
  Curr_Time  Curr_Date
1  11:45:57 10-04-2018

df2 %>% anti_join(df1)
Joining, by = c("Curr_Time", "Curr_Date")
[1] Curr_Time Curr_Date
<0 rows> (or 0-length row.names)

即使您的df1为空，它也会有用，因此您可以像这样更新您的功能：

frameupdate<-function(){
df1<<-rbind(df1, anti_join(df2,df1))
}

或者你可以得到这样的东西

frameupdate <- function(){
if (nrow(df1[df1$Curr_Time==df2$Curr_Time & df1$Curr_Date==df2$Curr_Date,])==0)
    df1 <<- rbind(df1 , df2)
  }

frameupdate()

即使df1为空，运行此函数也会获得预期的输出。

df1
  Curr_Time  Curr_Date
1  11:43:34 10-04-2018
2  11:45:34 10-04-2018
3  11:45:57 10-04-2018

验证两个不同数据帧中的列值，并将不匹配的值附加到现有数据帧

2 个答案: