仅将最后个获取的页面保存在CSV中,而不是所有获取的页面。
我认为我可以将代码放在下面
for page in range(0, pages):
陷入循环。但这会产生一个IndentationError。我猜我需要附加每个页面,但是太新了,无法理解如何连接所有页面。感谢您为我指出正确的方向。
import requests
from bs4 import BeautifulSoup
import csv
start = "http://awebsite.com/index.php?filter=&cur_page=0"
url = "http:/awebsite.comindex.php?filter=&cur_page={}"
soup = BeautifulSoup(requests.get(start).content)
pages = 2
for page in range(0, pages):
soup = BeautifulSoup(requests.get(url.format(page)).content)
table = soup2.find("table", class_ ="style10b")
output_rows = []
for table_row in table.findAll('tr'):
columns = table_row.findAll('td')
output_row = []
for column in columns:
output_row.append(column.encode_contents())
output_rows.append(output_row)
with open('output.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(output_rows)
答案 0 :(得分:0)
import requests
from bs4 import BeautifulSoup
import csv
start = "http://www.bhpa.co.uk/documents/safety/informal_investigations/index.php?filter=&cur_page=0"
url = "http://www.bhpa.co.uk/documents/safety/informal_investigations/index.php?filter=&cur_page={}"
soup = BeautifulSoup(requests.get(start).content)
pages = 2
output_rows = []
for page in range(0, pages):
soup = BeautifulSoup(requests.get(url.format(page)).content)
table = soup.find("table", class_ ="style10b")
for table_row in table.findAll('tr'):
columns = table_row.findAll('td')
output_row = []
for column in columns:
output_row.append(column.encode_contents())
output_rows.append(output_row)
with open('output.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(output_rows)
我将页面集合缩进页面循环中,删除了一个错字,并且将output_rows放在顶部。这应该做。对于缩进错误,请注意不要混用空格和缩进。
答案 1 :(得分:0)
您可以使用pandas和concat
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "http://www.bhpa.co.uk/documents/safety/informal_investigations/index.php?filter=&cur_page={}"
pages = 2
final = pd.DataFrame()
for page in range(0, pages):
soup = BeautifulSoup(requests.get(url.format(page)).content, 'lxml')
table = pd.read_html(str(soup.select_one('table.style10b')),header =0, flavor = 'bs4')[0][:-2]
final = pd.concat([final, table], axis=0, ignore_index=True).fillna('')
print(final)
final.to_csv(r"C:\Users\User\Desktop\test.csv", encoding='utf-8-sig', index = False)