识别时从字符串中打印单个单词

时间:2013-12-11 22:54:48

标签: python python-2.7 csv

我是python和编程的新手,并且一直在尝试用这个项目来教我自己。

以下代码运行时没有输出错误,但会创建一个空的.csv。

我以为我可以使用words = text.split(),但我不能用发电机做到这一点。

以下是我获得的数据示例:

Wed Dec 11 22:51:56 +0000 2013,@KBIJR please contact me via email: 37four@gmail.com...thanks!,1260080780

我只想在我的.csv中的'text'字符串中打印电子邮件地址。

import csv
import json
import oauth2 as oauth
import urllib
import sys
import requests
import time

CONSUMER_KEY = ""
CONSUMER_SECRET = ""
ACCESS_KEY = ""
ACCESS_SECRET = ""

class TwitterSearch:
    def __init__(self,
        ckey    = CONSUMER_KEY,
        csecret = CONSUMER_SECRET,
        akey    = ACCESS_KEY,
        asecret = ACCESS_SECRET,
        query   = 'https://api.twitter.com/1.1/search/tweets.{mode}?{query}'
    ):
        consumer     = oauth.Consumer(key=ckey, secret=csecret)
        access_token = oauth.Token(key=akey, secret=asecret)
        self.client  = oauth.Client(consumer, access_token)
        self.query   = query

    def search(self, q, mode='json', **queryargs):
        queryargs['q'] = q
        query = urllib.urlencode(queryargs)
        return self.client.request(self.query.format(query=query, mode=mode))

def write_csv(fname, rows, header=None, append=False, **kwargs):
    filemode = 'ab' if append else 'wb'
    with open(fname, filemode) as outf:
        out_csv = csv.writer(outf, **kwargs)
        if header:
            out_csv.writerow(header)
        out_csv.writerows(rows)

def main():
    ts = TwitterSearch()
    response, data = ts.search('@gmail.com', result_type='recent')
    js = json.loads(data)

    messages = ([msg['created_at'], msg['text'], msg['user']['id']] for msg in js.get('statuses', []))

    search_terms = ['@gmail.com']
    text = messages
    matches = []
    for term in search_terms:
        match = [word for word in text if term in word]
        matches.append(match)   
        write_csv('twitter_gmail.csv', messages, append=True)

if __name__ == '__main__':
    main()

1 个答案:

答案 0 :(得分:0)

如果您只想从文本字符串中提取电子邮件:

import re

s = "Wed Dec 11 22:51:56 +0000 2013,@KBIJR please contact me via email: 37four@gmail.com...thanks!,1260080780"

match_emails = re.compile((
    "([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
    "{|}~-]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
    "\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)")
)

emails = match_emails.findall(s)
for email in emails:
    print email[0]

<强>输出:

37four@gmail.com