我正在尝试使用python从此页面获取“最后更改”日期和时间:
页面:https://imgur.com/a/hsVl7e1
代码:https://imgur.com/a/jHWcFDh
我已经使用bs4,汤和urllib尝试了不同的方式。
我确实获得了数据,但是其中一些数据丢失了,包括我需要的部分。
完成打印后,我希望在输出中的某个地方找到“最后一次更改dd / mm / yy”。
是否有更好的方法可以做到这一点?或者我想念什么?
答案 0 :(得分:1)
import requests
import lxml.html as lh
import pandas as pd
url= YOUR URL
#Create a handle, page, to handle the contents of the website
page = requests.get(url)
#Store the contents of the website under doc
doc = lh.fromstring(page.content)
#Parse data that are stored between <tr>..</tr> of HTML
tr_elements = doc.xpath('//tr')
#Create empty list
col=[]
i=0
#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
i+=1
name=t.text_content()
print '%d:"%s"'%(i,name)
col.append((name,[]))
#Since out first row is the header, data is stored on the second row onwards
for j in range(1,len(tr_elements)):
#T is our j'th row
T=tr_elements[j]
#i is the index of our column
i=0
#Iterate through each element of the row
for t in T.iterchildren():
data=t.text_content()
#Check if row is empty
if i>0:
#Convert any numerical value to integers
try:
data=int(data)
except:
pass
#Append the data to the empty list of the i'th column
col[i][1].append(data)
#Increment i for the next column
i+=1