我创建了一个python脚本,以从网页中获取一些表格内容,并使用熊猫ExcelWriter
将其写到excel文件中。表格数据正确地通过了,但是我无法将它们写到Excel文件中。我可以使用openpyxl
来写同样的东西,但是如果是熊猫ExcelWriter
,我会被卡住。
我尝试过:
import requests
import pandas as pd
from bs4 import BeautifulSoup
from pandas import ExcelWriter
link = "https://en.wikipedia.org/wiki/Comparison_of_Intel_processors"
result = []
res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")
for items in soup.select_one("table.wikitable").select("tr"):
data = [item.get_text(strip=True) for item in items.select("th,td")]
print(data)
result+=data
df = pd.DataFrame(result)
writer = ExcelWriter('tabular_content.xlsx')
df.to_excel(writer,'Sheet1',index=False)
writer.save()
为避免对我得到的东西和希望得到的东西感到困惑,我给出了两个示例来描述图片。
我目前的方法可以将数据写在单列中,如下所示。
Processor
SeriesNomenclature
CodeName
Production Date
Supported Features (Instruction Set)
Clock Rate
Socket
Fabri-cation
但是,我希望这样写它们:
Processor SeriesNomenclature CodeName Production Date Supported Features (Instruction Set)
4004 Nov. 15,1971
8008 N/A N/A April 1972 N/A
8080 N/A N/A April 1974 N/A
8085 N/A N/A March 1976 N/A
8086 N/A N/A June 8, 1978 N/A
8088 N/A N/A June 1979 N/A
80286 N/A N/A Feb. 1982 N/A
i80386 DX, SX, SL N/A 1985 - 1990 N/A
i80486 DX, SX, DX2, DX4, SL N/A 1989 - 1992 N/A
P.S。必须使用ExcelWriter
。
答案 0 :(得分:0)
ExcelWriter
似乎没有问题,在这种情况下,您甚至不需要BeautifulSoup。只需以这种方式读取数据
tables = pd.read_html("https://en.wikipedia.org/wiki/Comparison_of_Intel_processors")
writer = ExcelWriter('tabular_content.xlsx')
tables[0].to_excel(writer,'Sheet1',index=False)
writer.save()
而且,至少在我的系统上,它按预期方式创建了Excel文件。