我正在尝试从数据集中删除转推(以RT
开头的字符串),但我的grepl
命令似乎无法正常工作。
这很好用:
grepl("[^rt|RT][:alnum]",c("RT hi","rt boo","rtlolo","im goodRT"),ignore.case=T)
这失败了。为什么呢?
data<-structure(list(data = c("RT @4MySquad: This makes me sick!\\n#whiteprivilege\\n#BlackLivesMatter \\n#Policestate https:\\/\\/t.co\\/nDL0AHwWTd",
"RT @weaselzippers: D.C. Police Want Help Identifying #BlackLivesMatter Supporters Who Beat And Left Hero Marine For Dead\\u2026 https:\\/\\/t.co\\/tbmO\\u2026",
"RT @vicegandako: #PrayForMannyPacquiao #LoveWins", "\\Dig out of the binaries of right and wrong\\ - #BlackLivesMatter at Mizzou",
"Even Democrats think #Bernie 's ideas are unrealistic #insane #UNLV #BigBangTheory #Hillary2016 #blacklivesmatter https:\\/\\/t.co\\/ITDyXoAvtK",
"RT @eelawl1966: Former NAACP President Ben Jealous endorses Bernie Sanders\\n#BlackLivesMatter #BLM #Bernie2016 \\n https:\\/\\/t.co\\/Qom1KMwLHs",
"#SayNoToHillary #NoMoreClintons #FeelTheBern #BernieSanders #BlackLivesMatter #Disabled4Bernie #Women4Bernie... https:\\/\\/t.co\\/I8F21ilJgv",
"RT @JoshuaMannery: #BlackLivesMatter \\ud83d\\udc4a\\ud83c\\udffd https:\\/\\/t.co\\/tcEITKKGhd",
"lang:und", "@FoxNews Did he not say, \\Yes\\? Hopefully this story won't gain traction bc it's not reflective of the #blacklivesmatter movement",
"President Barack Obama Is Doing Big Things With Cuba + #BlackLivesMatter https:\\/\\/t.co\\/6gEJreOiUc",
"RT @Uberarabic: \\u0644\\u0644\\u0639\\u0644\\u0645 \\u0639\\u0642\\u0648\\u0628\\u0629 \\u0627\\u0644\\u0645\\u062b\\u0644\\u064a\\u064a\\u0646 \\u0641\\u064a \\u062c\\u0645\\u064a\\u0639 \\u0627\\u0644\\u062f\\u0627\\u064a\\u0627\\u0646\\u0627\\u062a \\u0627\\u0644\\u0633\\u0645\\u0627\\u0648\\u064a\\u0629 \\u0647\\u064a \\u0627\\u0644\\u0642\\u062a\\u0644\\n\\n#LoveWins",
"RT @AishaYesufu: Let's not forget 219#ChibokGirls still in captivity today 676 days \\n#NeverToBeForgotten #CryingToBeRescued #BringBackOurGi\\u2026",
"RT @arctic_matters: Chukchi Sea. #LoveWins https:\\/\\/t.co\\/gH8KZgVZk3",
". @DoubleFine r u joking, tim u know the servers aren't working you dumb asshole #gamergate",
"RT @realkingcalii: #BlackLivesMatter Kendrick Lamar \\Alright\\ - https:\\/\\/t.co\\/amlRn0fKsA",
"RT @DreamersMOMS: Community representing #CCA & @geogroups making dirty $$$$ w\\/immigrants. #WeAreFlorida #not1more #immigration https:\\/\\/t.c\\u2026",
"id_str:700012325831581696", "RT @DreamersMOMS: Con compa\\u00f1eras de Carolina del Norte apoy\\u00e1ndonos en #Tallahassee. #ProteccionNoDeportation #Not1More @grisalonso https:\\/\\/\\u2026",
"RT @IkeIsaacson2: Hey #blacklivesmatter this is a hate crime done by racists in your name. https:\\/\\/t.co\\/6uGSXAJcrM"
)), .Names = "data", row.names = c(NA, 20L), class = "data.frame")
data[grepl("[^rt|RT][:alnum]",data,ignore.case=T)]
this question也使用Twitter数据,但有不同的方法
答案 0 :(得分:1)
我们将模式指定为以(^
)开头的字符,后跟一个或多个空格(RT
),然后作为\\s+
,它也会得到以ignore.case = TRUE
开头,后跟空格的元素。
rt