我正在使用具有特定巴士站的公共巴士网站(请参阅变量“ url”),并且我想将每个列(“公交车线-出发时间-ETA”)解析为每个列表,但我正在这段代码的结果很奇怪:
import requests
from bs4 import BeautifulSoup
url = 'http://www.stcp.pt/pt/itinerarium/soapclient.php?codigo=AAL1'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
buses = []
for table in soup.find_all('table', attrs={
'id': 'smsBusResults'
}):
for row in table.find_all('tr', attrs={
'class': 'even'
}):
for col in row.find_all('td'):
buses.append(row.get_text().strip())
print(buses)
注意:如果您看到“地下通道”,则表示“经过”
答案 0 :(得分:1)
尝试
from bs4 import BeautifulSoup
import requests
import pandas as pd
data = requests.get('http://www.stcp.pt/pt/itinerarium/soapclient.php?codigo=AAL1').content
soup = BeautifulSoup(data)
table = soup.find_all('table', {'id':'smsBusResults'})
tr = table[0].find_all('tr')
headers = []
for td in tr[0].find_all('th'):
headers.append(td.text)
temp_df = pd.DataFrame(columns=headers)
pos = 0
for i in range(1,len(tr)):
temp_list = []
for td in tr[i].find_all('td'):
value = (td.text).replace('\n','')
value = value.replace('\t','')
temp_list.append(value)
temp_df.loc[pos] = temp_list
pos+=1
print(temp_df)
输出
Linha Hora Prevista Tempo de Espera
0 600 AV. ALIADOS 16:29 1min
1 202 AV.ALIADOS - 16:34 6min
2 600 AV. ALIADOS 16:41 12min
3 600 AV. ALIADOS 16:50 21min