从Twitter获取数据并解析

时间:2016-07-01 06:42:56

标签: python twitter tweepy

我想:

抓取多条推文,在推文内找到网址,抓住我找到的网址并保存每个网址"网址数据"到不同的txt文件。

到目前为止,我设法编写了这些部件并且卡住了,有人可以帮助我吗?

抓取推文:

import tweepy
from tweepy import OAuthHandler
import sys
import re

def process_or_store(tweet):
    print(json.dumps(tweet))

consumer_key = '***************************'
consumer_secret = '******************************'
access_token = '*******************************'
access_secret = '****************************'
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)

f = open("C:\\Twitter\Twitts.txt",'w')
print >>f, 'twits'


for status in tweepy.Cursor(api.home_timeline).items(20):
    # Process a single status
    print(status.text)
    f.write(status.text+ '\n')


def extract_urls(fname):
    with open(fname) as f:
        return re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', f.read())

来自网址的数据:

import urllib2

url = 'https://***************'
response = urllib2.urlopen(url)
with open('C:\\Twitter\Data_from_urls\ + "url"', 'w') as f:
    f.write(response.read())

0 个答案:

没有答案