尝试使用twitter运行spark流时出现IllegalStateException

时间:2017-05-25 06:32:05

标签: spark-streaming

我是新来的火花和斯卡拉。我试图在谷歌中运行一个例子。我在运行这个程序时会遇到异常。

例外是:

17/05/25 11:13:42 ERROR ReceiverTracker: Deregistered receiver for stream 0: Restarting receiver with delay 2000ms: Error starting Twitter stream - java.lang.IllegalStateException: Authentication credentials are missing. 

我正在执行的代码如下:

PrintTweets.scala

package example

    import org.apache.spark._
    import org.apache.spark.SparkContext._
    import org.apache.spark.streaming._
    import org.apache.spark.streaming.twitter._
    import org.apache.spark.streaming.StreamingContext._
    import org.apache.log4j.Level
    import Utilities._

object PrintTweets {

    def main(args: Array[String]) {

        // Configure Twitter credentials using twitter.txt
        setupTwitter()
        val appName = "TwitterData"    
        val conf = new SparkConf()    
        conf.setAppName(appName).setMaster("local[3]")    

        val ssc = new StreamingContext(conf, Seconds(5))        
        //val ssc = new StreamingContext("local[*]", "PrintTweets", Seconds(10))
        setupLogging()
        // Create a DStream from Twitter using our streaming context
        val tweets = TwitterUtils.createStream(ssc, None)
        // Now extract the text of each status update into RDD's using map()
        val statuses = tweets.map(status => status.getText())
        statuses.print()

    ssc.start()
    ssc.awaitTermination()
  }  
}

Utilities.scala

package example

import org.apache.log4j.Level
import java.util.regex.Pattern
import java.util.regex.Matcher

object Utilities {
    /** Makes sure only ERROR messages get logged to avoid log spam. */
  def setupLogging() = {
      import org.apache.log4j.{Level, Logger}   
      val rootLogger = Logger.getRootLogger()
      rootLogger.setLevel(Level.ERROR)   
  }

  /** Configures Twitter service credentials using twiter.txt in the main workspace directory */
    def setupTwitter() = {
        import scala.io.Source

        for (line <- Source.fromFile("../twitter.txt").getLines) {
            val fields = line.split(" ")
            if (fields.length == 2) {
                System.setProperty("twitter4j.oauth." + fields(0), fields(1))
      }
    }
  }

  /** Retrieves a regex Pattern for parsing Apache access logs. */
  def apacheLogPattern():Pattern = {
    val ddd = "\\d{1,3}"                      
    val ip = s"($ddd\\.$ddd\\.$ddd\\.$ddd)?"  
    val client = "(\\S+)"                     
    val user = "(\\S+)"
    val dateTime = "(\\[.+?\\])"              
    val request = "\"(.*?)\""                 
    val status = "(\\d{3})"
    val bytes = "(\\S+)"                     
    val referer = "\"(.*?)\""
    val agent = "\"(.*?)\""
    val regex = s"$ip $client $user $dateTime $request $status $bytes $referer $agent"
    Pattern.compile(regex)    
  }
}

当我使用打印状态检查时,我发现异常发生在行 val tweets = TwitterUtils.createStream(ssc,None)

我在twitter.txt文件中提供凭据,该文件由程序正确读取。当我没有将twitter.txt放在适当的目录中时,它会显示明确的错误,当我在twitter.txt中为客户密钥和密钥等提供空白密钥时,它会显示未经授权的显式错误

如果您需要有关错误相关信息或软件版本的更多详细信息,请与我们联系。

谢谢, 马杜。

2 个答案:

答案 0 :(得分:0)

我可以用你的代码重现这个问题。我相信你的问题。 您可能没有正确配置twitter.txt。你的twitter.txt文件应该是这样的 - &gt;

consumerKey your_consumerKey
consumerSecret your_consumerSecret
accessToken your_accessToken
accessTokenSecret your_accessTokenSecret 

我希望它有所帮助。

答案 1 :(得分:0)

将twitter.txt文件语法更改为以下后,键和值之间的单个空格有效

consumerKey your_consumerKey consumerSecret your_consumerSecret accessToken your_accessToken accessTokenSecret your_accessTokenSecret