我该如何更换

时间:2015-06-29 22:17:03

标签: r csv

现在我正在开发一个比较关键字列表的程序。基本思想是从CSV文件构建关键字及其相关频率列表,然后将这些关键字与主列表进行比较。

我遇到的问题是用CSV中的正确数字替换频率。我认为最好通过查看CSV文件的结构来看到这一点。例如,

File 1
------
car,5
house,6
bee,30
gator,10

File 2
------
jump,4
bee,20
go,9
bike,24
fence,31
might,20

Master
------
students,8
statistical,8
excel,6
mathematics,6
sas,6
student,5
report,5
washington,5
analysis,5
course,5
performance,4
pages,4
university,4
improve,4
using,4

请注意,在我的代码中,因为文件没有按字母顺序排列,所以频率没有被正确替换。

当我运行下面显示的代码时,这是输出:

 Error in `[<-.data.frame`(`*tmp*`, which(as.character(keywords$keywords)     %in%  :  replacement has 6 rows, data has 5

这是我的代码,错误从“keywordmax = as.data.frame(c(”0“))之后的行开始”

agg=function()
{
# Read in individual data sets
key1=read.csv("set1.csv",header=FALSE,sep=",")
key2=read.csv("set2.csv",header=FALSE,sep=",")
master=read.csv("master.csv",header=FALSE,sep=",")
exclude_list=read.csv("exclude.csv",header=FALSE,sep=",")

# Sort, capitalize, and keep unique values from the two keyword sets
keywords <- sapply(unique(sort(c(as.character(key1$V1), as.character(key2$V1)))), toupper)

# Keep keywords greater than 2 characters in length (basically exclude in at etc...)
keywords <- keywords[nchar(keywords) > 2]

# Keep keywords that are not in the exclude list
keywords <- setdiff(keywords, sapply(exclude_list, toupper))

# Compare the read keyword list to the master keyword list
# and keep the frequency column

key1$V1=sapply(key1[[1]], toupper)
key2$V1=sapply(key2[[1]], toupper)
master$V1=sapply(master[[1]], toupper)

keywords=as.data.frame(keywords)
keywordmax=as.data.frame(c("0"))
keywords[which(as.character(keywords$keywords) %in% as.character(key1$V1)),2]=key1[,2]
keywords[which(as.character(keywords$keywords) %in% as.character(key2$V1)),3]=key2[,2]
keywords[which(as.character(keywords$keywords) %in% as.character(master$V1)),4]=master[,2]

keywords[is.na(keywords)] = 0
keywordmax=keywords[,2:3]
keywordmax=apply(keywordmax, 1, max)
masterset=keywords[,4]

keywords=keywords[,1:2]
keywords$V2=as.data.frame(keywordmax)
keywords$V3=as.data.frame(masterset)

return(keywords)
}

0 个答案:

没有答案