Python - 验证,写入和附加.txt文件

时间:2017-05-30 05:03:07

标签: python twitter

我想验证我的抓取工具从网络获取的链接与我存储在.txt文件中的链接。在我的抓取工具从网络上检索链接后,它会将(' a')附加到我的.txt文件中。但是,如果链接已存在于我的.txt文件中,我想将其附加(' w')。关于我怎么做的任何想法?

    def spider(targetname, DOMAIN, g_data):
    for item in g_data:
        try:
            name = item.find_all("strong", {"class": "fullname show-popup-with-id "})[0].text
            username = item.find_all("span", {"class": "username u-dir"})[0].text
            post = item.find_all("p", {"class": "TweetTextSize TweetTextSize--normal js-tweet-text tweet-text"})[0].text
            replies = item.find_all("span", {"class": "u-hiddenVisually"})[3].text
            retweets = item.find_all("span", {"class": "u-hiddenVisually"})[4].text
            likes = item.find_all("span", {"class": "u-hiddenVisually"})[5].text
            retweetby = item.find_all("a", {"href": "/"+targetname})[0].text
            datas = item.find_all('a', {'class':'tweet-timestamp js-permalink js-nav js-tooltip'})
            for data in datas:
                link = DOMAIN + data['href']
                date = data['title']
            append_to_file(crawledfile, name, username, post, link, replies, retweets, likes, retweetby, date)
        except:
            pass


`def append_to_file(path, name, username, post, link, replies, retweets, likes, retweetby, date):
    with open(path, 'a') as file:
        try:
            file.write("Name: "+ name + '\n')
        except:
            print("Name: --Currently unavailable--" + '\n')
        try:
            file.write("Username: "+ username + '\n')
        except:
            print("Username: --Currently unavailable--" + '\n')
        try:
            file.write("Post: "+ post + '\n')
        except:
            print("Post: --Currently unavailable--" + '\n')
        try:
            file.write("post's link: "+ link.strip() + '\n')
        except:
            print("post's link: --Currently unavailable--" + '\n')
        try:
            file.write("Replies: "+ replies.strip() + '\n')
        except:
            print("Replies: --Currently unavailable--" + '\n')
        try:
            file.write("Retweet: "+ retweets.strip() + '\n')
        except:
            print("Retweet: --Currently unavailable--" + '\n')
        try:
            file.write("Likes: "+ likes.strip() + '\n')
        except:
            print("Likes: --Currently unavailable--" + '\n')
        try:
            if(username != "@" + targetname):
                file.write("Retweeted By: " + retweetby.strip() + '\n')
        except:
            file.write("Retweeted By: --Currently unavailable--" + '\n')
        try:
            file.write("Date: " + date + '\n')
        except:
            file.write("Date: --Currently unavailable--" + '\n')
        file.write("" + '\n')`




Name: Donald J. Trump Username: @realDonaldTrump Post: I look forward to paying my respects to our brave men and women on this Memorial Day at Arlington National Cemetery later this morning. post's link: https://twitter.com/realDonaldTrump/status/869170615881793536 Replies: 14,333 replies Retweet: 13,492 retweets Likes: 74,645 likes Date: 5:36 AM - 29 May 2017

Name: Donald J. Trump Username: @realDonaldTrump Post: Today we remember the men and women who made the ultimate sacrifice in serving. Thank you, God bless your families & God bless the USA! post's link: https://twitter.com/realDonaldTrump/status/869170351049240576 Replies: 8,827 replies Retweet: 33,541 retweets Likes: 123,112 likes Date: 5:35 AM - 29 May 2017

1 个答案:

答案 0 :(得分:0)

如果我正确地解释了您的陈述,您只需要附加字符' a'或者' w'如果链接已存在于文件中,则根据条件将文本文件添加到文本文件中。为此,您可以使用以下代码:

def append_to_file(path, name, username, post, link, replies, retweets, likes, retweetby, date):
    with open(path, 'a') as file:
        if link.strip() in file.read():
            to_append = 'a'
        else:
            to_append = 'w'
        try:
            file.write("Name: " + name + to_append + '\n')
        except:
            print("Name: -- Currently unavailable--" + '\n')
        try:
            file.write("Username: " + username + to_append + '\n')
        except:
            print("Username: -- Currently unavailable--" + '\n')
        try:
            file.write("Post: " + post + to_append + '\n')
        except:
            print("Post: -- Currently unavailable--" + '\n')
        try:
            file.write("post's link: " + link.strip() + to_append + '\n')
        except:
            print("post's link: -- Currently unavailable--" + '\n')
        try:
            file.write("Replies: " + replies.strip() + to_append + '\n')
        except:
            print("Replies: -- Currently unavailable--" + '\n')
        try:
            file.write("Retweet: " + retweets.strip() + to_append + '\n')
        except:
            print("Retweet: -- Currently unavailable--" + '\n')
        try:
            file.write("Likes: " + likes.strip() + to_append + '\n')
        except:
            print("Likes: -- Currently unavailable--" + '\n')
        try:
            if(username != "@" + targetname):
                file.write("Retweeted By: " +
                           retweetby.strip() + to_append + '\n')
        except:
            file.write(
                "Retweeted By: -- Currently unavailable--" + '\n')
        try:
            file.write("Date: " + date + to_append + '\n')
        except:
            file.write("Date: -- Currently unavailable--" +
                       to_append + '\n')
        file.write("" + '\n')