请将以下代码转换为PySpark
。无论我如何尝试,我都会不断收到Py4JJavaError。
否则,请分享在PySpark
中实施Stanford CoreNLP NER的链接。
import org.apache.spark.sql.functions._
import com.databricks.spark.corenlp.functions._
val input= Seq((1, "<xml>Stanford University is located in California. It is a great university.</xml>")).toDF("id", "text")
val output= input.select(cleanxml('text).as('doc)).select(explode(ssplit('doc)).as('sen)).select('sen, tokenize('sen).as('words), ner('sen).as('nerTags),sentiment('sen).as('sentiment))
output.show(truncate = false)
情感分析不是我的优先事项,NER是。