我使用以下代码收集与特定主题相关的推文,但在我提取的所有推文中,'places'属性为None。难道我做错了什么?此外,该代码旨在提取现有的推文,我不需要流API解决方案,也不需要寻找这种流API的解决方案:https://www.quora.com/How-can-I-get-a-stream-of-tweets-from-a-particular-country-using-Twitter-API
api = Twython(consumer_key, consumer_secret, access_key, access_secret)
tweets = []
MAX_ATTEMPTS = 200
COUNT_OF_TWEETS_TO_BE_FETCHED = 10000
in_max_id = sys.argv[1]
next_max_id = ''
for i in range(0,MAX_ATTEMPTS):
if(COUNT_OF_TWEETS_TO_BE_FETCHED < len(tweets)):
break # we got 500 tweets... !!
#----------------------------------------------------------------#
# STEP 1: Query Twitter
# STEP 2: Save the returned tweets
# STEP 3: Get the next max_id
#----------------------------------------------------------------#
# STEP 1: Query Twitter
if(0 == i):
# Query twitter for data.
results = api.search(q="#something",count='100',lang='en',max_id=in_max_id,include_entities='true',geo= True)
else:
# After the first call we should have max_id from result of previous call. Pass it in query.
results = api.search(q="#something",include_entities='true',max_id=next_max_id,lang='en',geo= True)
# STEP 2: Save the returned tweets
for result in results['statuses']:
temp = ""
tweet_text = result['text']
temp += tweet_text.encode('utf-8') + " "
hashtags = result['entities']['hashtags']
for i in hashtags:
temp += i['text'].encode('utf-8') + " "
print result
#temp += i["place"]["country"] + "\n"
#output_file.write(temp)
# STEP 3: Get the next max_id
try:
# Parse the data returned to get max_id to be passed in consequent call.
next_results_url_params = results['search_metadata']['next_results']
next_max_id = next_results_url_params.split('max_id=')[1].split('&')[0]
except:
# No more next pages
break
答案 0 :(得分:1)
如果您的应用将处理的所有推文都必须place
字段,那么您可以将搜索限制在一个地方以确保所有结果肯定都有。
您可以通过设置geocode
(纬度,经度,半径[km / mi])参数来限制您在某个区域内的搜索。
通过Twython这样的请求示例是:
geocode = '25.032341,55.385557,100mi'
api.search(q="#something",count='100',lang='en',include_entities='true',geocode=geocode)
答案 1 :(得分:1)
简短的回答是,不,你没有做错任何事。所有place
标记都为空的原因是因为统计上它们不太可能包含数据。只有约1%的推文在place
标记中包含数据。这是因为用户很少发布他们的位置。默认情况下,位置已关闭。
下载100条或更多推文,您可能会找到place
代码数据。
答案 2 :(得分:0)
并非所有推文都包含所有字段,如tweet_text,地点,国家/地区,语言等,
因此,为避免KeyError
使用以下方法。修改您的代码,以便在找不到您要查找的key
时,返回默认值。
result.get('place', {}).get('country', {}) if result.get('place') != None else None
此处,上述行表示在获取密钥country
后搜索密钥place
(如果存在),否则返回None
&#34;
答案 3 :(得分:0)
kmario是对的。大多数推文没有这些信息,但只有一小部分。进行位置搜索会增加这种机会,例如https://api.twitter.com/1.1/search/tweets.json?q=place%3Acba60fe77bc80469&count=1
"place": {
"id": "cba60fe77bc80469",
"url": "https://api.twitter.com/1.1/geo/id/cba60fe77bc80469.json",
"place_type": "city",
"name": "Tallinn",
"full_name": "Tallinn, Harjumaa",
"country_code": "EE",
"country": "Eesti",
"contained_within": [],
"bounding_box": {
"type": "Polygon",
"coordinates": [
[
[
24.5501404,
59.3518286
],
[
24.9262886,
59.3518286
],
[
24.9262886,
59.4981855
],
[
24.5501404,
59.4981855
]
]
]
},
"attributes": {}
},