使用函数参数作为文件名[R]

时间:2015-12-16 15:30:22

标签: r function csv

我有许多数据帧,我想运行一些标记并保存到csv。我试图将我一直在使用的代码放在一个函数中,并用工作文件的名称编写一个csv。

对于这个例子,我提出了一个名为subzibo2的数据帧。当我运行该函数虽然我在写csv阶段遇到错误。我尝试将文件名与paste和sprintf连接起来,但都不起作用。

对于我得到的paste()选项

  

文件错误(文件,ifelse(追加,“a”,“w”)):无效   'description'参数另外:警告信息:if if(file ==   “”)file< - stdout()else if(is.character(file)){:条件   长度> 1,只使用第一个元素

我得到的sprintf()选项

  

sprintf出错(“subs /.% d.csv”,prodsplit):不支持的类型

有人可以帮忙吗?我究竟做错了什么?为了方便起见,我把代码中的alt write.csv作为注释留下了。当我运行它们而不包含在函数中时,函数中的步骤都可以工作。有关信息,他们采用电子表格并将列标记为ProdNameReduced,并返回包含所有各种令牌短语选项(短语或部分)的数据框,其中包含每个短语中的单词数和subzibo2数据帧中的出现次数。

library(tm) 
library(RWeka)
library(plyr)
library(dplyr)

subzibo2 = data.frame(ProdNameReduced = c("zibo muffin fold over x 100", "zibo muffin fold over x 1", "zibo sandwich 250s x 1", "zibo sandwich x 1s", "zibo 500g clamshell punnet x 1",    "zibo burger fold over x 300", "zibo burger fold over x 1", "zibo 500g clamshell punnet x 500s", "zibo 1kg clamshell punnet x 500s", "zibo 1kg clamshell punnet x 1", "zibo 4 cavity fruit tray x 1","zibo 4 cavity fruit tray x 500", "zibo 2 cavity fruit tray x 1", "zibo 2 cavity fruit tray x 1000"), Code = c("ZIBOZFO6BOX", "ZIBOZFO6", "ZIBOSANDWICH", "ZIBOS/WICHSINGL", "ZIBOCS85", "ZIBOBURGERBOX","ZIBOBURGER", "ZIBOBOX500G", "ZIBOBOX1KG", "ZIBO781KG", "ZIBO4LOOSE", "ZIBO4", "ZIBO2LOOSE", "ZIBO2"))

ProdType = function(prodsplit)
{
    prodsplit$ProdNameReduced = as.character(prodsplit$ProdNameReduced)

max_ngram = max(sapply(strsplit(prodsplit$ProdNameReduced, " "), length))

    BigramTokenizer <- function(x) {RWeka::NGramTokenizer(x, RWeka::Weka_control(min = 1, max = max_ngram))}

    prodsplit_corpus = Corpus(VectorSource(prodsplit$ProdNameReduced))
    tdm <- TermDocumentMatrix(prodsplit_corpus, control = list(tokenize = BigramTokenizer))
    rm(prodsplit_corpus)
    tdm_matrix = as.matrix(tdm)
    rm(tdm)
    tdm_matrix_rowsums = sort(rowSums(tdm_matrix), decreasing = T)
    rm(tdm_matrix)
    tdm_matrix_rowsums_df = as.data.frame(tdm_matrix_rowsums)
    rm(tdm_matrix_rowsums)
    tdm_matrix_rowsums_df$phrases = row.names(tdm_matrix_rowsums_df)
    rownames(tdm_matrix_rowsums_df) = NULL
    tdm_matrix_rowsums_df$phrasecount = vapply(strsplit(tdm_matrix_rowsums_df$phrases, "\\S+"), length, integer(1))

    colnames(tdm_matrix_rowsums_df) = c("occurence","phrases", "phrasecount")
    tdm_matrix_rowsums_df = ddply(tdm_matrix_rowsums_df, .(phrases), colwise(sum))
    tdm_matrix_rowsums_df = arrange(tdm_matrix_rowsums_df, phrases, occurence)
    tdm_matrix_rowsums_df = select(tdm_matrix_rowsums_df, phrasecount, occurence, phrases)
    tdm_matrix_rowsums_df$selector = character(nrow(tdm_matrix_rowsums_df))

    #write.csv(tdm_matrix_rowsums_df, file = paste("subs/", prodsplit, ".csv", sep = ""))
    write.csv(tdm_matrix_rowsums_df, file = sprintf("subs/.%d.csv" , prodsplit))

}

ProdType(subzibo2)

0 个答案:

没有答案