R模式匹配列中的多个行组合以进行替换?

时间:2017-04-07 21:10:12

标签: r string gsub

我正在尝试弄清楚如何从同一数据框中另一列的数据框的一列中识别任何字符串的实例,以便替换。在这种情况下,我有论坛帖子,我已经拉入其中人们通过名称引用其他用户,我想摆脱这些名称进行分析,否则他们将算作高数量的单词。以下是此数据框的输入:

structure(list(uber_name = structure(c(9L, 2L, 1L, 2L, 3L, 10L, 
3L, 9L, 11L), .Label = c("aluber1968", "bigdreamslittlemoney", 
"FuberNYC", "JamesM", "jonnyplastic", "JustDre", "KING D", "klimarov", 
"NycGirl705", "shumacker", "spike69", "theitalian", "Uberman8263", 
"Ez2dj", "Manhmptn", "NYCDriver", "staytune", "UBS", "Ubured", 
"Jme10", "Lennyyellowcab", "Mir", "eagle88", "Ibuys4730", "NoUsername", 
"BathoTrask", "Douglas", "LGC", "Jakeinny098", "Rustyshackelford", 
"shabbyroch", "ubershiza", "drbrkln", "elys123", "bossdriver", 
"HerbyHerb", "Jim1985", "Malik38", "STIDRIVER", "vxlon7", "Waqar", 
"tohunt4me", "DogPound", "SuliB", "AlBrklyn", "John Cunningham", 
"MReeves", "PinkFoot", "alextheboss", "luisannalui", "censoredbytheFCC", 
"KONY", "cieru", "Jorlev", "Smooth954", "marcusguber", "nyc321", 
"Tony from New Jersey", "Vanstaal", "Bkrah", "brunoamat2", "gebbels6", 
"Kevin7889", "uanic", "Uber OG", "UberKilledMyMarriage", "ya mon its me", 
"HunkAWestchester", "Mr Affinito", "ninja warrior", "NoNonsense", 
"notacabdriver", "Notauberhater", "TwoFiddyMile", "bilyvh", "cybertec69", 
"JohnnyBlanco", "SOBE", "ubernyc"), class = "factor"), uber_write = c("I see people post about getting a w", 
"you have 2 choices either you drive", "More than a year ago I didnt drive ", 
"yeah i stopped driving for them for", "Ive been getting some promotions la", 
"FuberNYC saidIve been getting some ", "shumacker saidAnd You feel importan", 
"FuberNYC saidIve been getting some ", "They start coming after few months "
), uber_date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("Jan 19, 2017", "Mar 30, 2017", "Jan 23, 2017", 
"Jan 12, 2017", "Jan 9, 2017", "Jan 1, 2017", "Dec 31, 2016", 
"Nov 26, 2016", "Nov 3, 2016", "Dec 22, 2016", "Dec 13, 2016", 
"Dec 2, 2016", "Nov 15, 2016", "Oct 31, 2016", "Oct 20, 2016", 
"Mar 14, 2017", "Sep 1, 2016", "Jul 26, 2016", "Mar 1, 2017", 
"Feb 25, 2017", "Sep 8, 2016", "Sep 9, 2016", "Apr 21, 2015"), class = "factor")), .Names = c("uber_name", 
"uber_write", "uber_date"), class = c("data.table", "data.frame"
), row.names = c(NA, -9L), .internal.selfref = <pointer: 0x0000000000220788>)

之前我使用过gsub,但我无法弄清楚如何将它应用于此实例。我想在“uber_names”列中取任何名称,并从发布的任何“uber_writes”中删除这些用户。

2 个答案:

答案 0 :(得分:0)

您可以在data.table(uber_names)中创建所有用户名的向量dt,然后生成正则表达式(name1|name2|name3),将所有匹配的用户名替换为{{1喜欢:

""

答案 1 :(得分:0)

我无法重新创建您的数据框,但这是一个关闭的数据框:

data <- 
structure(list(uber_name = c("aluber1968", "bigdreamslittlemoney", 
"FuberNYC", "JamesM", "jonnyplastic", "JustDre", "KING D", "klimarov", 
"NycGirl705", "shumacker", "spike69", "theitalian", "Uberman8263", 
"Ez2dj", "Manhmptn", "NYCDriver", "staytune", "UBS", "Ubured", 
"Jme10", "Lennyyellowcab", "Mir", "eagle88", "Ibuys4730", "NoUsername", 
"BathoTrask", "Douglas", "LGC", "Jakeinny098", "Rustyshackelford", 
"shabbyroch", "ubershiza", "drbrkln", "elys123", "bossdriver", 
"HerbyHerb", "Jim1985", "Malik38", "STIDRIVER", "vxlon7", "Waqar", 
"tohunt4me", "DogPound", "SuliB", "AlBrklyn", "John Cunningham", 
"MReeves", "PinkFoot", "alextheboss", "luisannalui", "censoredbytheFCC", 
"KONY", "cieru", "Jorlev", "Smooth954", "marcusguber", "nyc321", 
"Tony from New Jersey", "Vanstaal", "Bkrah", "brunoamat2", "gebbels6", 
"Kevin7889", "uanic", "Uber OG", "UberKilledMyMarriage", "ya mon its me", 
"HunkAWestchester", "Mr Affinito", "ninja warrior", "NoNonsense", 
"notacabdriver", "Notauberhater", "TwoFiddyMile", "bilyvh", "cybertec69", 
"JohnnyBlanco", "SOBE", "ubernyc"), uber_write = c("I see people post about getting a w", 
"you have 2 choices either you drive", "More than a year ago I didnt drive ", 
"yeah i stopped driving for them for", "Ive been getting some promotions la", 
"FuberNYC saidIve been getting some ", "shumacker saidAnd You feel importan", 
"FuberNYC saidIve been getting some ", "They start coming after few months ", 
"I see people post about getting a w", "you have 2 choices either you drive", 
"More than a year ago I didnt drive ", "yeah i stopped driving for them for", 
"Ive been getting some promotions la", "FuberNYC saidIve been getting some ", 
"shumacker saidAnd You feel importan", "FuberNYC saidIve been getting some ", 
"They start coming after few months ", "I see people post about getting a w", 
"you have 2 choices either you drive", "More than a year ago I didnt drive ", 
"yeah i stopped driving for them for", "Ive been getting some promotions la", 
"FuberNYC saidIve been getting some ", "shumacker saidAnd You feel importan", 
"FuberNYC saidIve been getting some ", "They start coming after few months ", 
"I see people post about getting a w", "you have 2 choices either you drive", 
"More than a year ago I didnt drive ", "yeah i stopped driving for them for", 
"Ive been getting some promotions la", "FuberNYC saidIve been getting some ", 
"shumacker saidAnd You feel importan", "FuberNYC saidIve been getting some ", 
"They start coming after few months ", "I see people post about getting a w", 
"you have 2 choices either you drive", "More than a year ago I didnt drive ", 
"yeah i stopped driving for them for", "Ive been getting some promotions la", 
"FuberNYC saidIve been getting some ", "shumacker saidAnd You feel importan", 
"FuberNYC saidIve been getting some ", "They start coming after few months ", 
"I see people post about getting a w", "you have 2 choices either you drive", 
"More than a year ago I didnt drive ", "yeah i stopped driving for them for", 
"Ive been getting some promotions la", "FuberNYC saidIve been getting some ", 
"shumacker saidAnd You feel importan", "FuberNYC saidIve been getting some ", 
"They start coming after few months ", "I see people post about getting a w", 
"you have 2 choices either you drive", "More than a year ago I didnt drive ", 
"yeah i stopped driving for them for", "Ive been getting some promotions la", 
"FuberNYC saidIve been getting some ", "shumacker saidAnd You feel importan", 
"FuberNYC saidIve been getting some ", "They start coming after few months ", 
"I see people post about getting a w", "you have 2 choices either you drive", 
"More than a year ago I didnt drive ", "yeah i stopped driving for them for", 
"Ive been getting some promotions la", "FuberNYC saidIve been getting some ", 
"shumacker saidAnd You feel importan", "FuberNYC saidIve been getting some ", 
"They start coming after few months ", "I see people post about getting a w", 
"you have 2 choices either you drive", "More than a year ago I didnt drive ", 
"yeah i stopped driving for them for", "Ive been getting some promotions la", 
"FuberNYC saidIve been getting some ", "shumacker saidAnd You feel importan"
)), .Names = c("uber_name", "uber_write"), row.names = c(NA, 
-79L), class = "data.frame")

这是一个答案:

paste0(data$uber_name, collapse = "|") -> dont_want
data$uber_write2 <- gsub(pattern = dont_want, "", data$uber_write)