我一直在关注在线教程,但是我不想使用标题附带的教程数据,而是要使用以下代码:
我的问题是我的表没有标题,因此它使用第一行作为标题。如何设置已定义的“乘车”和“队列时间”标题?
谢谢
import requests
import lxml.html as lh
import pandas as pd
url='http://www.ridetimes.co.uk/'
page = requests.get(url)
doc = lh.fromstring(page.content)
tr_elements = doc.xpath('//tr')
r_elements = doc.xpath('//tr')
col=[]
i=0
#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
i+=1
name=t.text_content()
print '%d:"%s"'%(i,name)
col.append((name,[]))
print(col)
答案 0 :(得分:0)
如何尝试:
>>> pd.DataFrame(col,columns=["Ride","Queue Time"])
Ride Queue Time
0 Spinball Whizzer []
1 0 mins []
如果我是正确的,那么这就是答案。
答案 1 :(得分:0)
使用熊猫获取表,然后只需分配列名:
import pandas as pd
url='http://www.ridetimes.co.uk/'
df = pd.read_html(url)[0]
df.columns = ['Ride', 'Queue Time']
输出:
print (df)
Ride Queue Time
0 Spinball Whizzer 0 mins
1 Nemesis 5 mins
2 Oblivion 5 mins
3 Wicker Man 5 mins
4 The Smiler 10 mins
5 Rita 20 mins
6 TH13TEEN 25 mins
7 Galactica Currently Unavailable
8 Enterprise Currently Unavailable
答案 2 :(得分:0)
考虑使用与页面相同的源来更新返回json的值。您在网址中添加了一个随机数,以防止提供缓存的结果。这样不仅可以thrill
进行所有组类型的操作。
import requests
import random
import pandas as pd
i = random.randint(1,1000000000000000000)
r = requests.get('http://ridetimes.co.uk/queue-times-new.php?r=' + str(i)).json() #to prevent cached results being served
df = pd.DataFrame([(item['ride'], item['time']) for item in r], columns = ['Ride', ' Queue Time'])
print(df)
如果您只希望thrill
组,请修改此行:
df = pd.DataFrame([(item['ride'], item['time']) for item in r if item['group'] == 'Thrill'], columns = ['Ride', ' Queue Time'])