R Sqldf:条件插入二进制值

时间:2017-12-12 14:37:10

标签: r sqlite insert sqldf

我在R中使用sqldf包。我有2个数据集,

  • 完整的学生名单
  • 提交作业的学生列表。 (有失踪的学生)

    完整列表:

    Student1
    Student2
    Student3
    Student4
    Student5
    

    提交清单:

    Student1
    Student2
    Student5
    

    我想在完整列表中添加一列,并输入1或0,具体取决于学生是否已提交作业。所以最终的完整列表看起来像

    Student1   1
    Student2   1
    Student3   0
    Student4   0
    Student5   1
    

    执行此操作的R代码和sql(sqlite?)代码是什么? (两者都是为了澄清)

  • 2 个答案:

    答案 0 :(得分:0)

    您可以向“已提交”列表添加一个值为1的列。然后,可以使用dplyrsqldf连接这两个表。最后,可以在一列中添加0,表示是否在最终表格中提交了作业。

    library(data.table)
    library(dplyr)
    library(sqldf)
    
    full_list <- data.frame(x = c("Student1", "Student2", "Student3", "Student4", "Student5"))
    submitted_list <- data.frame(x = c("Student1", "Student2", "Student5"))
    setDT(submitted_list)
    
    submitted_list <- submitted_list[, assin_completed := 1L]
    
    
    # using dplyr
    dt <- left_join(full_list, submitted_list, by = "x")
    
    # or using sqldf
    dt <- sqldf("select full_list.x, submitted_list.assin_completed from full_list left outer join submitted_list on full_list.x = submitted_list.x")
    
    setDT(dt)
    dt <- dt[is.na(assin_completed), assin_completed := 0L]
    

    决赛桌dt将提供您想要的输出。

              x assin_completed
    1: Student1               1
    2: Student2               1
    3: Student3               0
    4: Student4               0
    5: Student5               1
    

    答案 1 :(得分:0)

    sqldf

    1)in 使用最后注释中定义的输入:

    library(sqldf)
    
    sqldf("select Student, (Student in SubmittedDF) Submitted from FullDF")
    

    ,并提供:

       Student Submitted
    1 Student1         1
    2 Student2         1
    3 Student3         0
    4 Student4         0
    5 Student5         1
    

    2)左连接/合并另一种方法是将Submitted1定义为SubmittedDF,但是第二列为1(名为Submitted)并且然后将FullDF数据帧加入到它中,用0替换连接生成的NULL值。

    library(sqldf)
    
    sqldf("with Submitted1 as (select *, 1 Submitted from SubmittedDF)
      select Student, coalesce(Submitted, 0) Submitted 
      from FullDF left join Submitted1 using(Student)")
    

    ,并提供:

       Student Submitted
    1 Student1         1
    2 Student2         1
    3 Student3         0
    4 Student4         0
    5 Student5         1
    

    普通R代码

    3)%in%关于普通R代码(没有包),我们可以像这样使用%in%

    transform(FullDF, Submitted = (Student %in% SubmittedDF$Student) + 0)
    

    ,并提供:

       Student Submitted
    1 Student1         1
    2 Student2         1
    3 Student3         0
    4 Student4         0
    5 Student5         1
    

    4)合并/替换另一种方法是使用merge执行左连接,然后使用replace将连接生成的NA更改为0。

    Submitted1 <- cbind(SubmittedDF, Submitted = 1)
    transform(merge(FullDF, Submitted1, all.x = TRUE), 
                 Submitted = replace(Submitted, is.na(Submitted), 0))
    

    注意:以可重现的形式输入:

    Lines1 <- "Student
    Student1
    Student2
    Student3
    Student4
    Student5"
    
    Lines2 <- "Student
    Student1
    Student2
    Student5"
    
    FullDF <- read.table(text = Lines1, header = TRUE, strip.white = TRUE)
    SubmittedDF <- read.table(text = Lines2, header = TRUE, strip.white = TRUE)