使用红宝石提取推文

时间:2014-02-09 13:37:36

标签: ruby mongodb twitter tweetstream

我需要流式传输推文并将其存储在mongodb中进行处理。我已经安装了ruby以及mongo和tweetstream gem。

我运行以下代码来提取推文并将其存储在mongodb的“tweet”数据库中名为“users”的集合中。这是程序rawks.rb

require "tweetstream"
require "mongo"
require "time"
db = Mongo::Connection.new("localhost", 27017).db("tweet")
tweets = db.collection("users")
TweetStream::Daemon.new("username","password","scrapedaemon").on_error do |message|
# Log your error message somewhere
end.filter({"locations" => "-12.72216796875, 49.76707407366789, 1.977539, 61.068917"}) do    |status|
# Do things when nothing's wrong
data = {"created_at" => Time.parse(status.created_at), "text" => status.text, "geo" =>      status.geo, "coordinates" => status.coordinates, "id" => status.id, "id_str" => status.id_str}
tweets.insert({"data" => data});
end

当我运行此文件时,我收到以下错误: 来自rawks.rb:8:'new'      rawks.rb:8:在''

在文件daemon.rb中,40:'initialize'错误的参数个数(3个用于2)参数错误

这是daemon.rb文件

require 'daemons'

# A daemonized TweetStream client that will allow you to
# create backgroundable scripts for application specific
# processes. For instance, if you create a script called
# <tt>tracker.rb</tt> and fill it with this:
#
#     require 'rubygems'
#     require 'tweetstream'
#
#     TweetStream.configure do |config|
#       config.consumer_key = 'abcdefghijklmnopqrstuvwxyz'
#       config.consumer_secret = '0123456789'
#       config.oauth_token = 'abcdefghijklmnopqrstuvwxyz'
#       config.oauth_token_secret = '0123456789'
#       config.auth_method = :oauth
#     end
#
#     TweetStream::Daemon.new('tracker').track('intridea') do |status|
#       # do something here
#     end
#
# And then you call this from the shell:
#
#     ruby tracker.rb start
#
# A daemon process will spawn that will automatically
# run the code in the passed block whenever a new tweet
# matching your search term ('intridea' in this case)
# is posted.
#
class TweetStream::Daemon < TweetStream::Client
DEFAULT_NAME = 'tweetstream'.freeze
DEFAULT_OPTIONS = {:multiple => true}

attr_accessor :app_name, :daemon_options

# The daemon has an optional process name for use when querying
# running processes.  You can also pass daemon options.
def initialize(name = DEFAULT_NAME, options = DEFAULT_OPTIONS)
@app_name = name
@daemon_options = options
super({})
end

def start(path, query_parameters = {}, &block) #:nodoc:
Daemons.run_proc(@app_name, @daemon_options) do
super(path, query_parameters, &block)
end
end
end

1 个答案:

答案 0 :(得分:0)

你做

TweetStream::Daemon.new("username","password","scrapedaemon")

有三个参数,但应该只有两个,第二个是选项的散列:

initialize(name = DEFAULT_NAME, options = DEFAULT_OPTIONS)

(似乎有不同的文档,ruby-doc.org上的文档显示了您尝试的用法,但您使用的源代码看起来更像这里所描述的:http://rdoc.info/github/intridea/tweetstream/TweetStream/Daemon