我正在尝试将情绪值附加到每条消息,并且我已将所有stanford核心jar文件作为依赖项下载:
void dequeue_push_back(dequeue** dq, int data)
{
//hp is a pointer to a struct
//dq** points to the beginning of the list
if (hp == NULL)
{
dequeue OnlyElement;
OnlyElement.data = data;
OnlyElement.next = NULL;
OnlyElement.prev = NULL;
hp = &OnlyElement;
dq = &hp;
}
else
{
while (hp)
{
hp = hp->next; //error occurs in this line
}
}
}
到目前为止,一切都很好,因为我可以执行计算并保存DS
import sqlContext.implicits._
import com.databricks.spark.corenlp.functions._
import org.apache.spark.sql.functions._
val version = "3.6.0"
val model = s"stanford-corenlp-$version-models-english" //
val jars = sc.listJars
if (!jars.exists(jar => jar.contains(model))) {
import scala.sys.process._
s"wget http://repo1.maven.org/maven2/edu/stanford/nlp/stanford-
corenlp/$version/$model.jar -O /tmp/$model.jar".!!
sc.addJar(s"/tmp/$model.jar")}
val all_messages = spark.read.parquet("/home/ubuntu/messDS.parquet")
case class AllMessSent (user_id: Int, sent_at: java.sql.Timestamp, message:
String)
val messDS = all_messages.as[AllMess]
我可以将结果输出为:case class AllMessSentiment = (user_id: Int, sent_at:
java.sql.Timestamp, message: String, sentiment: Int)
val output = messDS
.select('user_id,'message,'sent_at,
sentiment('message).as('sentiment)).as[AllMessSentiment])
import java.util
output.write.parquet("/home/ubuntu/AllMessSent.parquet")
我可以看到情绪分数但是当写入csv或者镶木地板时,错误如下所示,是否有人知道如何解决它?:
output.show(truncate = false)
答案 0 :(得分:0)
当所有消息被分成句子并清除特殊字符和空格时,我能够运行算法。