Question

我使用以下代码收集与特定主题相关的推文，但在我提取的所有推文中，'places'属性为None。难道我做错了什么？此外，该代码旨在提取现有的推文，我不需要流API解决方案，也不需要寻找这种流API的解决方案：https://www.quora.com/How-can-I-get-a-stream-of-tweets-from-a-particular-country-using-Twitter-API

api =   Twython(consumer_key, consumer_secret, access_key, access_secret)

tweets                          =   []
MAX_ATTEMPTS                    =   200
COUNT_OF_TWEETS_TO_BE_FETCHED   =   10000
in_max_id = sys.argv[1]
next_max_id = ''
for i in range(0,MAX_ATTEMPTS):

    if(COUNT_OF_TWEETS_TO_BE_FETCHED < len(tweets)):
        break # we got 500 tweets... !!

    #----------------------------------------------------------------#
    # STEP 1: Query Twitter
    # STEP 2: Save the returned tweets
    # STEP 3: Get the next max_id
    #----------------------------------------------------------------#

    # STEP 1: Query Twitter
    if(0 == i):
        # Query twitter for data. 
        results    = api.search(q="#something",count='100',lang='en',max_id=in_max_id,include_entities='true',geo= True)
    else:
        # After the first call we should have max_id from result of previous call. Pass it in query.
        results    = api.search(q="#something",include_entities='true',max_id=next_max_id,lang='en',geo= True)

    # STEP 2: Save the returned tweets
    for result in results['statuses']:

        temp = ""
        tweet_text = result['text']
        temp += tweet_text.encode('utf-8') + " "
        hashtags = result['entities']['hashtags']
        for i in hashtags:
            temp += i['text'].encode('utf-8') + " " 
        print result
        #temp += i["place"]["country"] + "\n"
        #output_file.write(temp)




    # STEP 3: Get the next max_id
    try:
        # Parse the data returned to get max_id to be passed in consequent call.
        next_results_url_params    = results['search_metadata']['next_results']
        next_max_id        = next_results_url_params.split('max_id=')[1].split('&')[0]
    except:
        # No more next pages
        break

Answer 1

如果您的应用将处理的所有推文都必须place字段，那么您可以将搜索限制在一个地方以确保所有结果肯定都有。

您可以通过设置geocode（纬度，经度，半径[km / mi]）参数来限制您在某个区域内的搜索。

通过Twython这样的请求示例是：

geocode = '25.032341,55.385557,100mi'
api.search(q="#something",count='100',lang='en',include_entities='true',geocode=geocode)

Answer 2

简短的回答是，不，你没有做错任何事。所有place标记都为空的原因是因为统计上它们不太可能包含数据。只有约1％的推文在place标记中包含数据。这是因为用户很少发布他们的位置。默认情况下，位置已关闭。

下载100条或更多推文，您可能会找到place代码数据。

Answer 3

并非所有推文都包含所有字段，如tweet_text，地点，国家/地区，语言等，

因此，为避免KeyError使用以下方法。修改您的代码，以便在找不到您要查找的key时，返回默认值。

result.get('place', {}).get('country', {}) if result.get('place') != None else None

此处，上述行表示在获取密钥country后搜索密钥place（如果存在），否则返回None＆＃34;

Answer 4

kmario是对的。大多数推文没有这些信息，但只有一小部分。进行位置搜索会增加这种机会，例如https://api.twitter.com/1.1/search/tweets.json?q=place%3Acba60fe77bc80469&count=1

  "place": {
    "id": "cba60fe77bc80469",
    "url": "https://api.twitter.com/1.1/geo/id/cba60fe77bc80469.json",
    "place_type": "city",
    "name": "Tallinn",
    "full_name": "Tallinn, Harjumaa",
    "country_code": "EE",
    "country": "Eesti",
    "contained_within": [],
    "bounding_box": {
      "type": "Polygon",
      "coordinates": [
        [
          [
            24.5501404,
            59.3518286
          ],
          [
            24.9262886,
            59.3518286
          ],
          [
            24.9262886,
            59.4981855
          ],
          [
            24.5501404,
            59.4981855
          ]
        ]
      ]
    },
    "attributes": {}
  },

无法获得Tweet的国家 - Twython API

4 个答案: