我试图了解http://dsnotes.com/articles/text2vec中的public void input(String path, PrintWriter out) throws FileNotFoundException, IOException
{
String finalstring;
FileInputStream in = new FileInputStream(path);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
Path FILE_PATH = Paths.get("C:/10", "tweets_6.txt");
BufferedWriter writer = Files.newBufferedWriter(FILE_PATH, StandardCharsets.UTF_8, StandardOpenOption.APPEND);
String line;
while((line = br.readLine()) != null)
{
finalstring = line;
URLEntity u;
finalstring = finalstring.replaceAll("https?://\\S+\\s?", "");
finalstring=finalstring.replace("#engineeringproblems", " ");
finalstring=finalstring.replace("#", " ");
// Stemming Algorithm
StringTokenizer st = new StringTokenizer(finalstring);
String finalstring1;
finalstring = "";
while (st.hasMoreTokens())
{
KrovetzStemmer ks = new KrovetzStemmer();
finalstring1 = ks.stem(st.nextToken());
// repeated characters remover
finalstring1 = finalstring1.replaceAll("(.)\\2{2,}", "$2");
FileInputStream in1 = new FileInputStream("C:\\10\\NonWords.txt");
BufferedReader br1 = new BufferedReader(new InputStreamReader(in1));
FileInputStream in2 = new FileInputStream("C:\\10\\StopWords.txt");
BufferedReader br2 = new BufferedReader(new InputStreamReader(in2));
String line1;
String line2;
while((line1 = br1.readLine()) != null)
{
if(finalstring1.equals(line1))
{
finalstring += finalstring1 + " ";
}
}
while((line2 = br2.readLine()) != null)
{
if(finalstring1.equals(line2))
{
finalstring += finalstring1 + " ";
}
}
}
writer.write(finalstring);
writer.newLine();
}
}
包
但是在接下来的步骤中:
现在我们可以构建DTM。同样,由于与语料库构造相关的所有函数都有流API,我们必须创建迭代器并将其提供给create_vocab_corpus函数:
text2vec
此代码抛出错误:
错误:无法找到功能" create_vocab_corpus"
答案 0 :(得分:1)
请参阅最新版本教程(0.3):https://cran.r-project.org/web/packages/text2vec/vignettes/text-vectorization.html。 v 0.3中有一些API中断。