Spark 2.0.1写入错误:引起:java.util.NoSuchElementException

时间:2016-10-11 17:48:43

标签: java scala apache-spark stanford-nlp

我正在尝试将情绪值附加到每条消息,并且我已将所有stanford核心jar文件作为依赖项下载:

void dequeue_push_back(dequeue** dq, int data)
{
    //hp is a pointer to a struct
    //dq** points to the beginning of the list
    if (hp == NULL)
    {
        dequeue OnlyElement;
        OnlyElement.data = data;
        OnlyElement.next = NULL;
        OnlyElement.prev = NULL;
        hp = &OnlyElement;
        dq = &hp;
    }

    else
    {
        while (hp)
        {
            hp = hp->next; //error occurs in this line
        }
    }
}

到目前为止,一切都很好,因为我可以执行计算并保存DS

import sqlContext.implicits._
import com.databricks.spark.corenlp.functions._
import org.apache.spark.sql.functions._

val version = "3.6.0"
val model = s"stanford-corenlp-$version-models-english" //
val jars = sc.listJars
if (!jars.exists(jar => jar.contains(model))) {
import scala.sys.process._
s"wget http://repo1.maven.org/maven2/edu/stanford/nlp/stanford-         
corenlp/$version/$model.jar -O /tmp/$model.jar".!!
sc.addJar(s"/tmp/$model.jar")}

val all_messages = spark.read.parquet("/home/ubuntu/messDS.parquet")

case class AllMessSent (user_id: Int, sent_at: java.sql.Timestamp, message:    
String)

val messDS = all_messages.as[AllMess]

我可以将结果输出为:case class AllMessSentiment = (user_id: Int, sent_at: java.sql.Timestamp, message: String, sentiment: Int) val output = messDS .select('user_id,'message,'sent_at, sentiment('message).as('sentiment)).as[AllMessSentiment]) import java.util output.write.parquet("/home/ubuntu/AllMessSent.parquet") 我可以看到情绪分数但是当写入csv或者镶木地板时,错误如下所示,是否有人知道如何解决它?:

output.show(truncate = false)

1 个答案:

答案 0 :(得分:0)

当所有消息被分成句子并清除特殊字符和空格时,我能够运行算法。