如何在webscraping时向csv写一个新列?

时间:2016-10-25 10:00:55

标签: python csv web-scraping

我想在编写这个网页编写程序时获得一些快速帮助。到目前为止,它正在正确地抓取内容,但我无法将其写入csv文件。

我正在从每位评论者那里抓两件事:评分和书面评论

我想将评论分数写入第一栏,将书面评论写入第二栏。但是,作者只能逐行进行。

感谢您的任何帮助! :)

import os, requests, csv
from bs4 import BeautifulSoup

# Get URL of the page
URL = ('https://www.tripadvisor.com/Attraction_Review-g294265-d2149128-Reviews-Gardens_by_the_Bay-Singapore.html')

with open('GardensbytheBay.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)

    # Looping until the 5th page of reviews
    for pagecounter in range(3):

        # Request get the first page
        res = requests.get(URL)
        res.raise_for_status

        # Download the html of the first page
        soup = BeautifulSoup(res.text, "html.parser")
        # Match it to the specific tag for all 5 ratings
        reviewElems = soup.findAll('img', {'class': ['sprite-rating_s_fill rating_s_fill s50', 'sprite-rating_s_fill rating_s_fill s40', 'sprite-rating_s_fill rating_s_fill s30', 'sprite-rating_s_fill rating_s_fill s20', 'sprite-rating_s_fill rating_s_fill s10']})
        reviewWritten = soup.findAll('p', {'class':'partial_entry'})

        if reviewElems:
            for row, rows in zip(reviewElems, reviewWritten): 
                review_text = row.attrs['alt'][0] 
                review2_text = rows.get_text(strip=True).encode('utf8', 'ignore').decode('latin-1')
                writer.writerow([review_text]) 
                writer.writerow([review2_text])

            print('Writing page', pagecounter + 1)
        else:
            print('Could not find clue.')

        # Find URL of next page and update URL
        if pagecounter == 0:
            nextLink = soup.select('a[data-offset]')[0]
        elif pagecounter != 0:
            nextLink = soup.select('a[data-offset]')[1]

        URL = 'http://www.tripadvisor.com' + nextLink.get('href')

print('Download complete') 

2 个答案:

答案 0 :(得分:2)

您可以将评论分数和文字放在同一行但不同的列中:

writer.writerow([review_text, review2_text]) 

您的初始方法将每个项目作为一个单独的行并连续写入,这不是您想要的。

答案 1 :(得分:0)

您可以使用pandas dataFrame:

import pandas as pd
import numpy as np
csv_file = pd.read_csv('GardensbytheBay.csv')
csv_file.insert(idx, cloname, value)
csv_input.to_csv('output.csv', index=False)