我正在研究一个遵循这个基本逻辑的程序:
我遇到麻烦是第2步。我已经尝试了很多解决方案,但没有任何工作。这是我的代码:
# Read in individual data sets
set1=read.csv("set1.csv",header=FALSE,sep=",")
set2=read.csv("set2.csv",header=FALSE,sep=",")
exclude_list=read.csv("exclude.csv",header=FALSE,sep=",")
# Create a new set with the aggregate of all keyword sets,
# capitalizing all keywords and excluding keywords that are
# less than 2 characters in length
set_agg=rbind(set1,set2)
keywords=set_agg[c("V1")]
keywords = as.data.frame(sapply(keywords, toupper))
??? WHAT GOES HERE ???
# Sort and remove duplicate keywords from the keyword list
as.data.frame(keywords[order(keywords$V1),])
keywords=unique(keywords)
# Modify and capitalize the exclusion list
exclude_list=as.data.frame(exclude_list[c("V1")])
exclude_list=as.data.frame(sapply(exclude_list, toupper))
# Remove keywords matching the exclude list
`%ni%` <- Negate(`%in%`)
keywords=subset(keywords, V1 %ni% exclude_list$V1)
return(keywords)
作为参考,CSV文件的格式如下:
word1,
word2,
word3,
etc...
答案 0 :(得分:2)
您可以通过对关键字长度进行索引来进行索引:
keywords[sapply(keywords[,1], nchar) > 2,]
更新这是一个使用矢量更简单的完整版本:
## Assuming you have keywords and exclude_list originally stored as vectors
keywords <- sapply(unique(sort(c(set1, set2))), toupper)
keywords <- keywords[nchar(keywords) > 2]
keywords <- setdiff(keywords, sapply(exclude_list, toupper))