Question

我正在研究Hortonworks。我已经存储了从推特到Kafka主题的推文。我正在使用Kafka作为制作人和使用Spark作为消费者使用Scala在Spark-shell上对推文进行情绪分析。但我只想获取特定的来自文本，HashTag，推文等推文的内容是正面的还是负面的，来自推文的单词我选择作为正面或负面的单词。我的训练数据是Data.txt。

Data.txt包含单词和posititve，负面单词由Tab分隔....

像正面一样厄运负面注定负面的怀疑是积极的

我添加了依赖项：org.apache.spark：spark-streaming-kafka_2.10：1.6.2，org.apache.spark：spark-streaming_2.10：1.6.2

这是我的代码：

import org.apache.spark._
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.Seconds 
import org.apache.spark.streaming.kafka._

val conf = new SparkConf().setMaster("local[4]").setAppName("KafkaReceiver")
val ssc = new StreamingContext(conf, Seconds(5))
val zkQuorum="sandbox.hortonworks.com:2181"
val group="test-consumer-group"
val topics="test"
val numThreads=5
val args=Array(zkQuorum, group, topics, numThreads)
val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2)
val hashTags = lines.flatMap(_.split(" ")).filter(_.startsWith("#"))
val wordSentimentFilePath = "hdfs://sandbox.hortonworks.com:8020/TwitterData/Data.txt"
val wordSentiments = ssc.sparkContext.textFile(wordSentimentFilePath).map { line =>
      val Array(word, happiness) = line.split("\t")
      (word, happiness)
    } cache()
    val happiest60 = hashTags.map(hashTag => (hashTag.tail, 1)).reduceByKeyAndWindow(_ + _, Seconds(60)).transform{topicCount => wordSentiments.join(topicCount)}.map{case (topic, tuple) => (topic, tuple._1 * tuple._2)}.map{case (topic, happinessValue) => (happinessValue, topic)}.transform(_.sortByKey(false))
happiest60.print()
ssc.start()

我得到了这样的输出，

（消极，恐惧）（积极，适应）

我想要这样的输出，

（＃sports，来自推文的文字，健身，积极）

但我没有得到像上面那样存储Text和Hashtag的解决方案。

使用scala中的spark从tweets中获取特定内容时发出问题

0 个答案: