我在R中有一个数据集,它列出了一堆公司名称,并希望删除“Inc”,“Company”,“LLC”等字样,以便进行清理工作。我有以下示例数据:
的sampleData
Animated.parallel([
Animated.timing(this.state.opacity, {
toValue: 0,
duration: 300
}),
Animated.timing(this.state.height, {
toValue: 0,
duration: 300
})
]).start(() => {
// callback
});
我不希望在输出中包含的字词:
Location Company
1 New York, NY XYZ Company
2 Chicago, IL Consulting Firm LLC
3 Miami, FL Smith & Co.
我构建了以下函数来分解每个单词,删除停用词,然后将单词重新组合在一起,但它不会遍历数据集的每一行。
stopwords = c("Inc","inc","co","Co","Inc.","Co.","LLC","Corporation","Corp","&")
上述函数的输出如下所示:
removeWords <- function(str, stopwords) {
x <- unlist(strsplit(str, " "))
paste(x[!x %in% stopwords], collapse = " ")
}
removeWords(sampleData$Company,stopwords)
Ť 他的输出应该是:
[1] "XYZ Company Consulting Firm Smith"
任何帮助都将不胜感激。
答案 0 :(得分:5)
我们可以使用&#t;&#39;包
base64_B
答案 1 :(得分:3)
稍微检查一下停用词(已插入&#34; \&#34;在公司中以避免正则表达式,空格):(但如果您不想留意停止词,则应优先选择上一个答案)
stopwords = c("Inc","inc","co ","Co ","Inc."," Co\\.","LLC","Corporation","Corp","&")
gsub(paste0(stopwords,collapse = "|"),"", df$Company)
[1] "XYZ Company" "Consulting Firm " "Smith "
df$Company <- gsub(paste0(stopwords,collapse = "|"),"", df$Company)
# df
# Location Company
#1 New York, NY XYZ Company
#2 Chicago, IL Consulting Firm
#3 Miami, FL Smith