无法打印df.shape,它已经从pandas读取数据帧格式了,谢谢!
# -*- coding:UTF-8 -*-
from pyvirtualdisplay import Display
from selenium import webdriver
import pandas as pd
display = Display(visible=0, size=(1024, 768))
display.start()
driver = webdriver.Firefox()
driver.get("http://www.fdmbenzinpriser.dk/searchprices/5/")
lines = [event.get_attribute('outerHTML') for event in driver.find_elements_by_xpath('//table[@id="sortabletable"]')]
df = pd.read_html(lines[0])
print df.shape
driver.close()
display.stop()
输出:
AttributeError: 'list' object has no attribute 'shape'
行[0]返回:
[ Unnamed: 0 Pris Adresse Tidspunkt
0 NaN 8.99 Odinsvej 2 4100 Ringsted 11 timer 55 m
1 NaN 9.09 Sdr.Havnegade 3 6000 Kolding 14 timer 48 m
2 NaN 9.09 Vestermarksvej 2 6600 Vejen 16 timer 35 m
3 NaN 10.99 Bøsbrovej 92B 8940 Randers SV 21 timer 1 m
答案 0 :(得分:2)
我认为需要改变:
df = pd.read_html(lines[0])
为:
df = pd.read_html(lines[0])[0]
对于所有数据:
df = pd.concat([pd.read_html(line)[0] for line in lines], ignore_index=True)
答案 1 :(得分:1)
Pandas读取html方法没有返回数据帧,它返回一个数据帧列表: http://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.read_html.html 您始终可以使用以下命令检查python中对象的类型:
print(type(obj))