我在数据框中有以下数据:
structure(list(`head(ker$text)` = structure(1:6, .Label = c("@_rpg_17 little league travel tourney. These parents about to be wild.",
"@auscricketfan @davidwarner31 yes WI tour is coming soon", "@keralatourism #favourite #destination #munnar #topstation https://t.co/sm9qz7Z9aR",
"@NWAWhatsup tour of duty in NWA considered a dismal assignment? Companies send in their best ppl and then those ppl don't want to leave",
"Are you Looking for a trip to Kerala? #Kerala-prime tourist attractions of India.Visit:http://t.co/zFCoaoqCMP http://t.co/zaGNd0aOBy",
"Are you Looking for a trip to Kerala? #Kerala, God's own country, is one of the prime tourist attractions of... http://t.co/FLZrEo7NpO"
), class = "factor")), .Names = "head(ker$text)", row.names = c(NA,
-6L), class = "data.frame")
我有另一个数据框,其中包含从上面的数据框中提取的主题标签。它如下:
structure(list(destination = c("#topstation", "#destination", "#munnar",
"#Kerala", "#Delhi", "#beach")), .Names = "destination", row.names = c(NA,
6L), class = "data.frame")
我想在我的第一个数据框中创建一个新列,它只包含与第二个数据帧匹配的标记。例如,df1的第一行没有任何主题标签,因此新列中的此单元格将为空白。但是,第二行包含4个主题标签,其中三个与第二个数据帧匹配。我尝试过使用:
str_match
str_extract
功能。我非常接近使用其中一个帖子中给出的代码来获取此内容。
new_col <- ker[unlist(lapply(destn$destination, agrep, ker$text)), ]
虽然我明白了,我得到一个列表作为输出我收到错误指示
replacement has 1472 rows, data has 644
我尝试将max.distance设置为不同的参数,每个参数都给出了差异误差。有人可以帮我解决问题吗?我想到的另一个选择是将每个主题标签放在一个单独的列中,但不确定它是否能帮助我用其他变量进一步分析数据。我正在寻找的输出如下:
text new_col new_col2 new_col3
statement1
statement2
statement3 #destination #munnar #topstation
statement4
statement5 #Kerala
statement6 #Kerala
答案 0 :(得分:0)
你可以这样做:
library(stringr)
results <- sapply(df$`head(ker$text)`,
function(x) { str_match_all(x, paste(df2$destination, collapse = "|")) })
df$matches <- results
如果要将结果分开,可以使用:
df <- cbind(df, do.call(rbind, lapply(results,
[, 1:max(sapply(results, length)))))
答案 1 :(得分:0)
library(stringi);
m <- sapply(stri_extract_all(df1[[1]],regex='#\\w+'),function(x) x[x%in%df2[[1]]]);
df1 <- cbind(df1,do.call(rbind,lapply(m,`[`,1:max(sapply(m,length)))));
df1;
## head(ker$text) 1 2 3
## 1 @_rpg_17 little league travel tourney. These parents about to be wild. <NA> <NA> <NA>
## 2 @auscricketfan @davidwarner31 yes WI tour is coming soon <NA> <NA> <NA>
## 3 @keralatourism #favourite #destination #munnar #topstation https://t.co/sm9qz7Z9aR #destination #munnar #topstation
## 4 @NWAWhatsup tour of duty in NWA considered a dismal assignment? Companies send in their best ppl and then those ppl don't want to leave <NA> <NA> <NA>
## 5 Are you Looking for a trip to Kerala? #Kerala-prime tourist attractions of India.Visit:http://t.co/zFCoaoqCMP http://t.co/zaGNd0aOBy #Kerala <NA> <NA>
## 6 Are you Looking for a trip to Kerala? #Kerala, God's own country, is one of the prime tourist attractions of... http://t.co/FLZrEo7NpO #Kerala <NA> <NA>
修改:如果您想为每个代码添加单独的列:
use_frameworks!