我需要网络抓取评论部分的帮助

时间:2020-02-11 15:46:56

标签: web-scraping

我目前正在做一个关于足球运动员的项目。我正在尝试对足球运动员进行一些公开评论,以进行情绪分析。但是我似乎无法刮擦评论。任何帮助将非常感激。这是我似乎无法做的评论部分。奇怪的是,我让它工作了,但是随后它停止了,我似乎再也没有收到任何评论。我要抓取的网站是:https://sofifa.com/player/192985/kevin-de-bruyne/200025/

likes = []
dislikes = []
follows = []
comments = []

driver_path = '/Users/niallmcnulty/Desktop/GeneralAssembly/Lessons/DSI11-lessons/week05/day2_web_scraping_and_apis/web_scraping/selenium-examples/chromedriver'
driver = webdriver.Chrome(executable_path=driver_path)


# i = 0

for url in tqdm_notebook(urls):

    driver.get(url)
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    sleep(0.2)

    soup1 = BeautifulSoup(driver.page_source,'lxml')

    try:
        dislike = soup1.find('button', attrs = {'class':'bp3-button bp3-minimal bp3-intent-danger dislike-btn need-sign-in'}).find('span',{'class':'count'}).text.strip()         
        dislikes.append(dislike)
    except:
        pass

    try:
        like = soup1.find('button', attrs = {'class':'bp3-button bp3-minimal bp3-intent-success like-btn need-sign-in'}).find('span',{'class':'count'}).text.strip()
        likes.append(like)
    except:
        pass

    try:
        follow = soup1.find('button', attrs = {'class':'bp3-button bp3-minimal follow-btn need-sign-in'}).find('span',{'class':'count'}).text.strip()
        follows.append(follow)
    except:
        pass

    try:
        comment = soup1.find_all('p').text[0:10]
        comments.append(comment)
    except:
        pass

#     i += 1

#     if i % 5 == 0:
#         sentiment = pd.DataFrame({"dislikes":dislikes,"likes":likes,"follows":follows,"comments":comments})
#         sentiment.to_csv('/Users/niallmcnulty/Desktop/GeneralAssembly/Lessons/DSI11-lessons/projects/cap-csv/sentiment.csv')

sentiment_final = pd.DataFrame({"dislikes":dislikes,"likes":likes,"follows":follows,"comments":comments})
# df_sent = pd.merge(df, sentiment, left_index=True, right_index=True)

1 个答案:

答案 0 :(得分:0)

评论部分是动态加载的。您可以尝试使用驱动程序捕获它,

try:
    comment_elements = driver.find_elements_by_tag_name('p')
    for comment in comment_elements:
        comments.append(comment.text)
except:
    pass
print(Comments)