我正在尝试向现有的 csv
文件添加新行。新的 row
来自 for loop
,它是 appended
到 string list
,并保存到 DataFrame
。我不希望将整个 loop
保存在内存中,然后保存到 csv
文件中。我更喜欢将每一行单独添加到文件中,在循环迭代时更新它,因为它是一个长时间运行的循环,不必等到整个循环完成。
我可以遍历该组,但会导致重复的行。
names = []
addresses = []
pages = np.arange(10300, 10400, 1)
for page in pages:
page = requests.get(
"https://www.testpage.com/" + str(page), headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
company = soup.find_all('main')
for container in company:
name = container.find("b", {"id": "company_name"})
names.append(name.text.strip())
address = container.find('div', attrs={'class': 'text location'})
addresses.append(address.text.strip())
companies=pd.DataFrame({
'name': names,
'address': addresses
})
companies.to_csv(r'b_10300_10400.csv', mode='a', header=False)
有什么想法吗?
答案 0 :(得分:2)
使用标准的 csv
模块,该模块用于一次写入一行内容。您没有进行任何与 pandas
相关的处理,而只是碍手碍脚。
import csv
pages = np.arange(10300, 10400, 1)
with open('b_10300_10400.csv', mode='a', newline='') as outfile:
writer = csv.writer(outfile)
for page in pages:
page = requests.get(
"https://www.testpage.com/" + str(page), headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
company = soup.find_all('main')
for container in company:
name = container.find("b", {"id": "company_name"}).text.strip()
address = container.find('div', attrs={'class': 'text location'}).text.strip()
writer.writerow((name, address))
答案 1 :(得分:1)
您应该在每次循环中重置 names
和 addresses
变量:
pages = np.arange(10300, 10400, 1)
for page in pages:
names = []
addresses = []
page = requests.get(
"https://www.testpage.com/" + str(page), headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
company = soup.find_all('main')
for container in company:
name = container.find("b", {"id": "company_name"})
names.append(name.text.strip())
address = container.find('div', attrs={'class': 'text location'})
addresses.append(address.text.strip())
companies=pd.DataFrame({
'name': names,
'address': addresses
})
companies.to_csv(r'b_10300_10400.csv', mode='a', header=False)