我是Python和编码方面的新手。我95%的人在那里,我建立的代码仅从Wikipedia检索表的第一行。看来我缺少了一些微不足道的东西。我也想请帮助。参见下面的代码:
from bs4 import BeautifulSoup
import requests
import pandas as pd
URL_TO = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response = requests.get(URL_TO)
soup = BeautifulSoup(response.text,'html.parser')
soup.prettify()
table = soup.find('table', {'class': 'wikitable sortable'}).tbody
rows = table.find_all('tr')
columns = [v.text.replace('\n', '') for v in rows[0].find_all('th')]
df = pd.DataFrame(columns = columns)
for i in range(1, len(rows)):
tds = rows[i].find_all('td')
if len(tds) ==3:
values= [tds[0].text.replace('\n',''), tds[1].text.replace('\n',''), tds[2].text.replace('\n','')]
else:
values = [td.text.replace('\n','') for td in tds]
df = df.append(pd.Series(values, index=columns), ignore_index=True)
df.head()
答案 0 :(得分:0)
您需要在行的迭代中包括df.append()
。目前,它会在您的迭代投放过程中进行,而不会追加。
其他几件事:
for row in rows:
'\n'
来删除任何.strip()
,而不是替换tds == 3:
空白空间。它将解决所有填充空间else:
<table>
语句?不需要.read_html()
标签,因此请使用熊猫
df.append
。它为您完成所有这些工作(在后台使用BeautifulSoup)。请参阅下面的代码因此,这是为您的代码所做的编辑。同样,真正需要更改的唯一事情是from bs4 import BeautifulSoup
import requests
import pandas as pd
URL_TO = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response = requests.get(URL_TO)
soup = BeautifulSoup(response.text,'html.parser')
soup.prettify()
table = soup.find('table', {'class': 'wikitable sortable'}).tbody
rows = table.find_all('tr')
columns = [v.text.strip() for v in rows[0].find_all('th')]
df = pd.DataFrame(columns = columns)
for row in rows:
if row.find_all('td'):
tds = row.find_all('td')
values = [td.text.strip() for td in tds]
df = df.append(pd.Series(values, index=columns), ignore_index=True)
import pandas as pd
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(url)[0]
使用PANDAS的结果相同:
import cv2
import numpy as np
import argparse
import time
t0 = time.time()
(code block)
t1 = time.time()
total = print(t1-t0)
cv2.imshow("thresh",thresh)
cv2.destroyAllWindows()