Question

我有一个问题，我真的很困惑，我很抱歉，如果这是如此愚蠢。到目前为止，我一直只使用数字数据来实现Java中的情感分析，这是通过仅使用Python库获得的，但现在我意识到我不得不使用Java从头开始将数据预处理为文本。

我想使用Weka的StringToWord来标记我的数据，然后应用预处理和tfidf。我的问题是，我如何处理arff文件中字符串中的符号？因为当我只是定义下面的属性时，我得到未在标题中声明的＆＃34;标称值，读取令牌[@Microsoft] ..＆＃34;对于我的数据中的第一行。

@relation corpus 
@attribute id numeric
@attribute text string
@attribute label {positive,neutral,negative}
@attribute label2 {neutral,non-neutral}
@data
628949369883000000  dear @Microsoft the...  negative    non-neutral

我也尝试用逗号表示我的数据。我得到了同样的错误。

628949369883000000,dear @Microsoft the...,negative,non-neutral

那么我应该怎么声明这个包含符号的字符串？

非常感谢。

Answer 1

好的，所以我只需要在引号中输入我的字符串。

@relation file 
@attribute id numeric
@attribute tweet string
@attribute label {positive,neutral,negative}
@attribute label2 {neutral,non-neutral}

@data
628949369883000000,"dear@ Microsoft .... C'mon.",negative,non-neutral

如何在arff文件中的字符串中声明符号（如@）？

1 个答案: