在下面的脚本中,我从带硒的worldometers.info/coronavirua上的表中刮取了冠状病毒数据。
from time import sleep
from selenium import webdriver
class CoronaBot():
def __init__(self):
self.driver = webdriver.Chrome()
def scraper(self):
self.driver.get('https://worldometers.info/coronavirus/')
main_table = self.driver.find_element_by_xpath('//*[@id="main_table_countries_today"]')
country = main_table.find_element_by_xpath("//td[contains(., 'Austria')]")
row = country.find_element_by_xpath("./..")
data = row.text.split(" ")
total_cases = data[0]
new_cases = data[1]
total_deaths = data[2]
new_deaths = data[3]
active_cases = data[4]
total_recovered = data[5]
serious_critical = data[6]
代码工作正常,我可以这样打印出来:
print("COVID-19 updates in: " + country.text)
print("Total Cases: " + total_cases)
...
但是,我想获取抓取结果的输出并将其放置在新的csv文件中(运行脚本时需要创建csv文件。)
我在熊猫上尝试了像这样的愚蠢的东西,但是显然没有用。有什么建议吗?
def create_csv(self):
collected_data = []
collected_data.append(output)
df = pd.DataFrame(collected_data, columns=['total_cases', 'new_cases', 'total_deaths',
'new_deaths', 'active_cases', 'total_recovered','serious_critical'])
df.to_csv('scraped_corona.csv')
答案 0 :(得分:1)
熊猫是一个很好的解决方案,您很亲近。在您的示例中,您可以仅使用抓取功能立即将数据放入数据框中。
首先,我将创建self.df
属性来存储数据框:
class CoronaBot():
def __init__(self):
self.driver = webdriver.Chrome()
column_names = ['total_cases', 'new_cases', 'total_deaths', 'new_deaths','active_cases', 'total_recovered', 'serious_critical']
self.df = pd.DataFrame(columns=column_names)
然后,在您收集数据之后,将其存储在self.df
中:
...
print("Total recovered: " + total_recovered)
print("Serious, critical cases: " + serious_critical)
self.df = self.df.append(
{'total_cases': total_cases,
'new_cases': new_cases,
'total_deaths': total_deaths,
'new_deaths': new_deaths,
'active_cases': active_cases,
'total_recovered': total_recovered,
'serious_critical': serious_critical}, ignore_index=True)
并添加导出功能:
def export_to_csv(self):
self.df.to_csv('scraped_corona.csv')
现在,当我跑步
c = CoronaBot()
c.scraper()
c.export_to_csv()
我得到了.csv文件。 希望对您有帮助,祝您好运!