'NoneType'对象没有属性'get'错误

时间:2016-04-22 08:20:32

标签: python beautifulsoup

我正在尝试使用Beautiful soup在twitter上提取用户图像和个人资料的链接。但我得到'NoneType'对象没有属性'get'错误。

def extract_tweets(html):

    soup = BeautifulSoup(html)
    write_to_file('soup.txt', soup)
    tweets = soup.find_all('li', attrs={'data-item-type':'tweet'})
    for tweet in tweets:
        tweet_text = tweet.find('p', class_='tweet-text')
        author_link = tweet.find('a', class_='js-user-profile-link').get('href')
        author_avatar = tweet.find('img', class_='avatar').get('src')

    return tweets

以下是供参考的HTML

  <li id="stream-item-tweet-723026869960871937" class="js-stream-item stream-item stream-item expanding-stream-item " data-item-type="tweet" data-item-id="723026869960871937">
    <div class="tweet js-stream-tweet js-actionable-tweet js-profile-popup-actionable original-tweet js-original-tweet has-cards has-content " data-component-context="tweet" data-has-cards="true" data-disclosure-type="" data-mentions="BarunSobtiSays" data-you-block="false" data-follows-you="false" data-you-follow="false" data-user-id="2895647851" data-name="chaitali mallick" data-screen-name="chaitalimallic1" data-permalink-path="/chaitalimallic1/status/723026869960871937" data-item-id="723026869960871937" data-tweet-id="723026869960871937">
    <div class="context"> </div>
    <div class="content">
    <div class="stream-item-header">
    <a class="account-group js-account-group js-action-profile js-user-profile-link js-nav" data-user-id="2895647851" href="/chaitalimallic1">
    <img class="avatar js-action-profile-avatar" alt="" src="https://pbs.twimg.com/profile_images/636712795808002048/CEs9XLwq_bigger.jpg">
    <strong class="fullname js-action-profile-name show-popup-with-id" data-aria-label-part="">chaitali mallick</strong>
    <span>‏</span>
    <span class="username js-action-profile-name" data-aria-label-part="">
    <s>@</s>
    <b>chaitalimallic1</b>
    </span>
    </a>
    <small class="time">
    </div>
    <div class="js-tweet-text-container">
    <p class="TweetTextSize js-tweet-text tweet-text" lang="hi" data-aria-label-part="0">
    Dil deke dard e mohabbat kiya hai.. Maine pyar kiya..pyar kiya.. pyar kiya hai..
    <a class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="2895447336" dir="ltr" href="/BarunSobtiSays">
    <s>@</s>
    <b>BarunSobtiSays</b>
    </a>
    miss you :(
    <a class="twitter-timeline-link u-hidden" dir="ltr" data-pre-embedded="true" href="">pic.twitter.com/5GE0uPYalh</a>
    </p>
    </div>

1 个答案:

答案 0 :(得分:-1)

您的功能与您发布的html一起运行正常。但我不确定你是否会返回正确的结果。也许是这样的?

def extract_tweets(html):
    soup = BeautifulSoup(html)
    write_to_file('soup.txt', soup)
    ret = list()
    tweets = soup.find_all('li', attrs={'data-item-type':'tweet'})
    for tweet in tweets:
        tweet_text = tweet.find('p', class_='tweet-text').text.strip()
        author_link = tweet.find('a', class_='js-user-profile-link').get('href')
        author_avatar = tweet.find('img', class_='avatar').get('src')
        ret.append({"text": tweet_text,
                    "author_link": author_link,
                    "author_avatar": author_avatar})

    return ret

但是如果您正在寻找的标签确实存在,您应该明确地查看html;)