列表中的字符串是否存在于列中?

时间:2019-09-10 20:58:58

标签: r

我的数据框的第一列(脚注)包含脚注。

我创建了字符串列表以标识其类型,例如:

law <- c("Directive", "Commission Decision", "TFEU", 
"TEU", "OJ L", "OJ C", "Case C-", "CJEU", "Council Decision", 
"Official Journal", "(EU)", "(EEC)", "legal basis", 
"Commission Regulation", "Article", "Regulation", "(EC)", 
"Legislative framework", "Treaty", "Resolution", "Convention", 
"Judgement of", "Ordinance", "Decision", "Paris Agreement", 
"Law", "Art.", "legislation", "Charter of", "AGRILEG", "REACH")

尝试使用str_detect分别查找每个单词时,它可以工作。但是,我想问一下是否存在列表中的任何元素,以便在新列(称为LAW)中打印“ TRUE”

使用正则表达式(字符串之间的“ |”)不起作用。我没有收到错误消息,但我进行了手动检查,尽管列表中的字符串没有出现在脚注中,但实际上到处都为TRUE。

我尝试为列表中的每个单词分别创建一个新列,但是后来我无法以excel格式导出数据框。我的想法是过滤LAW--LAW12列以将响应合并为1列,但是我也找不到任何方法。

我认为第一个想法会更快,但是我对如何实现它没有想法。

DATABASA_V6$LAW <- str_detect(DATABASE_V6$FOOTNOTES,"[Directive|Decision|TFEU|OJ L]") 

DATABASA_V6$LAW <- str_detect(DATABASE_V6$FOOTNOTES, "OJ L")
DATABASE_V6$LAW1 <- str_detect(DATABASE_V6$FOOTNOTES, "Regulation")
DATABASE_V6$LAW2 <- str_detect(DATABASE_V6$FOOTNOTES, "Directive")
DATABASE_V6$LAW3 <- str_detect(DATABASE_V6$FOOTNOTES, "TFEU")
DATABASE_V6$LAW4 <- str_detect(DATABASE_V6$FOOTNOTES, "TEU")
DATABASE_V6$LAW5 <- str_detect(DATABASE_V6$FOOTNOTES, "Legal basis")                              
DATABASE_V6$LAW6 <- str_detect(DATABASE_V6$FOOTNOTES, "Official Journal")
DATABASE_V6$LAW7 <- str_detect(DATABASE_V6$FOOTNOTES, "Case C-")
DATABASE_V6$LAW8 <- str_detect(DATABASE_V6$FOOTNOTES, "Decision")
DATABASE_V6$LAW9 <- str_detect(DATABASE_V6$FOOTNOTES, "Resolution")
DATABASE_V6$LAW10 <- str_detect(DATABASE_V6$FOOTNOTES, "Article")
DATABASE_V6$LAW11 <- str_detect(DATABASE_V6$FOOTNOTES, "Treaty")
DATABASE_V6$LAW12 <- str_detect(DATABASE_V6$FOOTNOTES, "Convention")

当试图确定列脚注中是否存在列表中的任何单词时,我希望从14,000行中接收到大约2000 TRUE。

1 个答案:

答案 0 :(得分:0)

您已经拥有大多数解决方案。正如Joran和r2evans上文所述,您不需要方括号。您也可以使用paste(law, collapse = "|")一步来格式化正则表达式的字符串列表。

law <- c("Directive", "Commission Decision", "TFEU", 
         "TEU", "OJ L", "OJ C", "Case C-", "CJEU", "Council Decision", 
         "Official Journal", "(EU)", "(EEC)", "legal basis", 
         "Commission Regulation", "Article", "Regulation", "(EC)", 
         "Legislative framework", "Treaty", "Resolution", "Convention", 
         "Judgement of", "Ordinance", "Decision", "Paris Agreement", 
         "Law", "Art.", "legislation", "Charter of", "AGRILEG", "REACH")

law_formatted <- paste0(law, collapse = "|")

tst <- data.frame(footnote = c("footnote footnote OJ L footnote footnote", 
                               "blah blah (EU) blah", 
                               "nothing to see here",
                               "words words. words words words Art."))
tst$law <- stringr::str_detect(tst$footnote, law_formatted)