在Spark中创建推文

时间:2015-03-13 04:59:09

标签: scala stream apache-spark twitter4j tweets

我尝试使用Scala和Twitter4j在Spark中创建推文流。 这是我的代码片段:

object auth{
      val config = new twitter4j.conf.ConfigurationBuilder()
        .setOAuthConsumerKey("")
        .setOAuthConsumerSecret("")
        .setOAuthAccessToken("")
        .setOAuthAccessTokenSecret("")
        .build
            }
    val conf = new SparkConf().setMaster("local[2]").setAppName("Tutorial")  
    val ssc = new StreamingContext(conf, Seconds(1))

    val twitter_auth = new TwitterFactory(auth.config)
    val a = new twitter4j.auth.OAuthAuthorization(auth.config)
    val atwitter =  twitter_auth.getInstance(a).getAuthorization()

当我尝试调用createstream时:

val tweets = TwitterUtils.createStream(ssc, atwitter, filters, DISK_ONLY_2)

我收到此错误:

[error] /home/shaza90/Desktop/streaming/scala/Tutorial.scala:30: overloaded method value createStream with alternatives:
[error]   (jssc: org.apache.spark.streaming.api.java.JavaStreamingContext,twitterAuth: twitter4j.auth.Authorization,filters: Array[String],storageLevel: org.apache.spark.storage.StorageLevel)org.apache.spark.streaming.api.java.JavaReceiverInputDStream[twitter4j.Status] <and>
[error]   (ssc: org.apache.spark.streaming.StreamingContext,twitterAuth: Option[twitter4j.auth.Authorization],filters: Seq[String],storageLevel: org.apache.spark.storage.StorageLevel)org.apache.spark.streaming.dstream.ReceiverInputDStream[twitter4j.Status]
[error]  cannot be applied to (org.apache.spark.streaming.StreamingContext, twitter4j.auth.Authorization, Seq[String], org.apache.spark.storage.StorageLevel)
[error]     val tweets = TwitterUtils.createStream(ssc, atwitter, filters, DISK_ONLY_2)
[error]                               ^
[error] one error found
[error] (compile:compile) Compilation failed

我不知道为什么它与这些类型不匹配,看到我的电话过载,你能帮忙吗?当我尝试使用None替换atwitter(授权对象)时,它会成功编译!!

2 个答案:

答案 0 :(得分:4)

我认为atwitter必须是Option [T]才能消除呼叫的歧义。 您可以使用:

val atwitter : Option[twitter4j.auth.Authorization] =  Some(twitter_auth.getInstance(a).getAuthorization())

而不是

val tweets = TwitterUtils.createStream(ssc, atwitter, filters, DISK_ONLY_2)

您也可以在通话中使用:Some(atwitter) ......如上所述。

这里有这个api的测试类:https://github.com/apache/spark/blob/master/external/twitter/src/test/scala/org/apache/spark/streaming/twitter/TwitterStreamSuite.scala

答案 1 :(得分:0)

正如您在错误日志中看到的那样,函数签名如下,

createStream(
             ssc: org.apache.spark.streaming.StreamingContext,
             twitterAuth: Option[twitter4j.auth.Authorization],
             filters: Seq[String],
             storageLevel: org.apache.spark.storage.StorageLevel
            ): org.apache.spark.streaming.dstream.ReceiverInputDStream[twitter4j.Status]

你试图像这样使用它,

val tweets = TwitterUtils.createStream(ssc, atwitter, filters, DISK_ONLY_2)

这意味着您使用以下签名调用它

createStream(
             org.apache.spark.streaming.StreamingContext,
             twitter4j.auth.Authorization,
             Seq[String],
             org.apache.spark.storage.StorageLevel
)

注意到区别?它预计Option[ twitter4j.auth.Authorization ]并且您提供twitter4j.auth.Authorization。因此,您需要使用twitter4j.auth.AuthorizationOption monadSome打包在一起。

val tweets = TwitterUtils.createStream( ssc, Some( atwitter ), filters, DISK_ONLY_2 )

有关选项类型的更多信息,请参阅Nepphtyte's guide to Scala by Daniel Westheide - &gt; http://danielwestheide.com/blog/2012/12/19/the-neophytes-guide-to-scala-part-5-the-option-type.html,对于有Scala基础知识的人来说,可以说是最好的Scala资源之一。