如何限制tweepy只给出地理标记的推文

时间:2015-09-22 00:20:03

标签: python twitter tweepy

我正在尝试从特定国家/地区获取推文。我正在使用tweepy api来获取推文。这是我到目前为止的代码 -

api = tweepy.API(auth)
places = api.geo_search(query="India", granularity="country")
place_id = places[0].id
public_tweets = api.search(q="place:%s" % place_id)
for one in public_tweets:
        print(one.place)

以下是我获取上述代码段的结果 -

None
None
Place(_api=<tweepy.api.API object at 0x1033f7690>, country_code=u'IN', url=u'https://api.twitter.com/1.1/geo/id/243cc16f6417a167.json', country=u'India', place_type=u'city', bounding_box=BoundingBox(_api=<tweepy.api.API object at 0x1033f7690>, type=u'Polygon', coordinates=[[[78.3897718, 17.3013989], [78.5404168, 17.3013989], [78.5404168, 17.4759], [78.3897718, 17.4759]]]), contained_within=[], full_name=u'Hyderabad, Andhra Pradesh', attributes={}, id=u'243cc16f6417a167', name=u'Hyderabad')
Place(_api=<tweepy.api.API object at 0x1033f7690>, country_code=u'IN', url=u'https://api.twitter.com/1.1/geo/id/1b8680cd52a711cb.json', country=u'India', place_type=u'city', bounding_box=BoundingBox(_api=<tweepy.api.API object at 0x1033f7690>, type=u'Polygon', coordinates=[[[77.3734736, 12.9190365], [77.7393706, 12.9190365], [77.7393706, 13.2313813], [77.3734736, 13.2313813]]]), contained_within=[], full_name=u'Bengaluru, Karnataka', attributes={}, id=u'1b8680cd52a711cb', name=u'Bengaluru')
None
None
None
None
None
None
Place(_api=<tweepy.api.API object at 0x1033f7690>, country_code=u'IN', url=u'https://api.twitter.com/1.1/geo/id/1dc2b546652c55dd.json', country=u'India', place_type=u'admin', bounding_box=BoundingBox(_api=<tweepy.api.API object at 0x1033f7690>, type=u'Polygon', coordinates=[[[73.8853747, 29.5438816], [76.9441213, 29.5438816], [76.9441213, 32.5763957], [73.8853747, 32.5763957]]]), contained_within=[], full_name=u'Punjab, India', attributes={}, id=u'1dc2b546652c55dd', name=u'Punjab')
Place(_api=<tweepy.api.API object at 0x1033f7690>, country_code=u'IN', url=u'https://api.twitter.com/1.1/geo/id/1dc2b546652c55dd.json', country=u'India', place_type=u'admin', bounding_box=BoundingBox(_api=<tweepy.api.API object at 0x1033f7690>, type=u'Polygon', coordinates=[[[73.8853747, 29.5438816], [76.9441213, 29.5438816], [76.9441213, 32.5763957], [73.8853747, 32.5763957]]]), contained_within=[], full_name=u'Punjab, India', attributes={}, id=u'1dc2b546652c55dd', name=u'Punjab')
None
None
Place(_api=<tweepy.api.API object at 0x1033f7690>, country_code=u'IN', url=u'https://api.twitter.com/1.1/geo/id/1b8680cd52a711cb.json', country=u'India', place_type=u'city', bounding_box=BoundingBox(_api=<tweepy.api.API object at 0x1033f7690>, type=u'Polygon', coordinates=[[[77.3734736, 12.9190365], [77.7393706, 12.9190365], [77.7393706, 13.2313813], [77.3734736, 13.2313813]]]), contained_within=[], full_name=u'Bengaluru, Karnataka', attributes={}, id=u'1b8680cd52a711cb', name=u'Bengaluru')

大部分推文都没有地理标记。如何确保结果中只显示地理标记的推文?

2 个答案:

答案 0 :(得分:1)

我也遇到过这个问题,其中推文的实际地理代码总是丢失。但是,您不应该需要每条推文的实际地理代码来满足您的要求;相反,您可以搜索特定地理区域内的推文,指定坐标和半径,如下所示:

def wordsearch(word, max_tweets, lang, geocode, since, out):
    # Query for 100 tweets that have word in them and store it in a list 
    searched_tweets = [status for status in tweepy.Cursor(api.search, n=max_tweets, q=word, lang=lang,  geocode=geocode, since=since).items(max_tweets)]
    print("Number of Matches: %d\n" % len(searched_tweets))
    csvfile = open(out, 'a')
    csvWriter = csv.writer(csvfile)
    for t in searched_tweets:
        csvWriter.writerow([t.created_at, t.text.encode('utf-8'), t.author.screen_name, t.place, t.retweeted, t.retweet_count, (not t.retweeted and 'RT @' not in t.text)])
    csvfile.close()

wordsearch('dead', 100, "en", "37.9,91.8,1000mi", "2017-01-01",      "result.csv")

答案 1 :(得分:0)

你以错误的方式接近这个。这两个功能不会那样工作。

首先查看Twitter文档:

  1. GET geo/search,您正在正确查找信息,但正如文档中所述,它不适用于GET搜索/推文
  2.   

    这是使用可以附加的查找位置的推荐方法   状态/更新。

    1. GET search/tweets,它仅用于查找具有您正在寻找的特定单词(或单个单词)列表的推文。 不能将geo_ids作为查询的一部分包含在内,除非您正在寻找包含它的推文
    2.   

      返回与指定查询匹配的相关推文的集合

      1. geo_ids用于here。如果向下滚动并查看提供的示例将给出一个想法,或者在(1)中的文档中提到的状态/更新。
      2. 如果您需要地理编码的推文,则可以使用geocode中的GET search/tweets功能来限制位置以获取推文。这将为您提供该位置的所有推文,一旦您获得这些推文,您就可以过滤地理编码的推文。

        过滤器必须在您的最终完成,而不是Twitter。