我正在尝试使用Beautifulsoup打印以列表格式给出的婴儿名字表。
google-python-exercises/google-python-exercises/babynames/baby1990.html (HTML页面是实际URL的屏幕截图)
使用urllib.request提取表并使用BeautifulSoup对其进行解析后,我能够在表的每一行中打印数据,但输出错误。
这是我的代码:
right_table = soup.find('table',attrs = {"summary" : "Popularity for top 1000"})
table_rows = right_table.find_all('tr')
for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)
应该打印1个包含行中所有数据的列表,但是,我得到了许多列表,每个新列表都以少一个记录开始
这样的:
['997', 'Eliezer', 'Asha', '998', 'Jory', 'Jada', '999', 'Misael', 'Leila', '1000', 'Tate', 'Peggy', 'Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']
['998', 'Jory', 'Jada', '999', 'Misael', 'Leila', '1000', 'Tate', 'Peggy', 'Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']
['999', 'Misael', 'Leila', '1000', 'Tate', 'Peggy', 'Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']
['1000', 'Tate', 'Peggy', 'Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']
['Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']
如何仅打印一个列表?
答案 0 :(得分:1)
我会尝试使用pandas和索引到表的结果列表中以获得所需的表
import pandas as pd
tables = pd.read_html('yourURL')
print(tables[1]) # for example; change index as required
答案 1 :(得分:0)
您的循环正在创建行列表,然后打印它,然后进入下一个迭代,在该迭代中,它创建一个行列表(覆盖上一个),然后打印它,等等,等等。
不确定为什么要将所有行都合并到一个列表中,但是要拥有一个最终列表,则需要在每次迭代时将每个行列表附加到最终列表中。
您实际上是说想要行列表的列表吗?
right_table = soup.find('table',attrs = {"summary" : "Popularity for top 1000"})
table_rows = right_table.find_all('tr')
result_list = []
for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
result_list = result_list + row
print(result_list)
如果您确实要列出行列表,请使用以下列表:
right_table = soup.find('table',attrs = {"summary" : "Popularity for top 1000"})
table_rows = right_table.find_all('tr')
result_list = []
for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
result_list.append(row)
print(result_list)
但是,老实说,我会按照QHarr的建议使用pandas和.read_html()。
right_table = soup.find('table',attrs = {"summary" : "Popularity for top 1000"})
table_rows = right_table.find_all('tr')
result_list = []
for tr in table_rows:
td = tr.find_all('td')
for data in td:
print (td.text)