Question

我有一个功能，可以在数据集中标记为垃圾字符串。我通过调用：

成功使用此函数

dtm_english.label <- getSpamLabel(comment$rawMessage, dictionary_english, 2) # 2 is the threshold level

但是当我打电话时

dtm_english.label <- ddply(comment, .(rawMessage), getSpamLabel, dictionary_english, 2, .progress = "text")

在ddply完成后没有任何输出我得到的任务

Error in do.call("c", res) : variable names are limited to 10000 bytes

我可以发布相关的功能

Answer 1

我不确定你要做什么，下次请准确描述你想要实现的目标。对我而言，您似乎正在尝试将函数应用于data.frame的一列。 ddply用于将函数应用于数据的子集。它被描述为“拆分数据框，应用函数，并在数据框中返回结果”。

如果您要执行的操作是在应用函数之前将列拆分为多个部分，则需要在数据框中使用一个因子来标记组。

你可以在ddply的.variable参数中使用“group”因子，而不是你想要应用函数的变量，FUN = summarize，然后你的函数调用。

dtm_english <- ddply(comment, .(group), summarize, 
                     label=getSpamLabel(rawMessage, dictionary_english, 2), 
                     .progress = "text")

这将为输出提供一个新的数据框，每个级别的组都有一行。

ddply：do.call（“c”，res）出错：变量名限制为10000字节

1 个答案: