将硒刮除数据导出到CSV文件?

时间:2020-08-12 15:20:11

标签: python selenium csv

在下面的脚本中,我从带硒的worldometers.info/coronavirua上的表中刮取了冠状病毒数据。

from time import sleep
from selenium import webdriver

class CoronaBot():
def __init__(self):
    self.driver = webdriver.Chrome()

def scraper(self):
    self.driver.get('https://worldometers.info/coronavirus/')
    main_table = self.driver.find_element_by_xpath('//*[@id="main_table_countries_today"]')
    country = main_table.find_element_by_xpath("//td[contains(., 'Austria')]")
    row = country.find_element_by_xpath("./..")
    data = row.text.split(" ")
    total_cases = data[0]
    new_cases = data[1]
    total_deaths = data[2]
    new_deaths = data[3]
    active_cases = data[4]
    total_recovered = data[5]
    serious_critical = data[6]

代码工作正常,我可以这样打印出来:

    print("COVID-19 updates in: " + country.text)
    print("Total Cases: " + total_cases)
    ...

但是,我想获取抓取结果的输出并将其放置在新的csv文件中(运行脚本时需要创建csv文件。)

我在熊猫上尝试了像这样的愚蠢的东西,但是显然没有用。有什么建议吗?

def create_csv(self):

    collected_data = []

    collected_data.append(output)

    df = pd.DataFrame(collected_data, columns=['total_cases', 'new_cases', 'total_deaths', 
    'new_deaths', 'active_cases', 'total_recovered','serious_critical'])
    df.to_csv('scraped_corona.csv')

1 个答案:

答案 0 :(得分:1)

熊猫是一个很好的解决方案,您很亲近。在您的示例中,您可以仅使用抓取功能立即将数据放入数据框中。

首先,我将创建self.df属性来存储数据框:

class CoronaBot():
    def __init__(self):
        self.driver = webdriver.Chrome()
        column_names = ['total_cases', 'new_cases', 'total_deaths', 'new_deaths','active_cases', 'total_recovered', 'serious_critical']
        self.df = pd.DataFrame(columns=column_names)

然后,在您收集数据之后,将其存储在self.df中:

...
print("Total recovered: " + total_recovered)
print("Serious, critical cases: " + serious_critical)

self.df = self.df.append(
    {'total_cases': total_cases,
     'new_cases': new_cases,
     'total_deaths': total_deaths,
     'new_deaths': new_deaths,
     'active_cases': active_cases,
     'total_recovered': total_recovered,
     'serious_critical': serious_critical}, ignore_index=True)

并添加导出功能:

    def export_to_csv(self):
        self.df.to_csv('scraped_corona.csv')

现在,当我跑步

c = CoronaBot()
c.scraper()
c.export_to_csv()

我得到了.csv文件。 希望对您有帮助,祝您好运!