当我在for循环内打印时,我看到了正确的值,但是当我将其转换为for循环外的数据框时,其中一列会被覆盖。
我尝试追加,将insert语句移入和移出嵌套循环,并连接数组和numpy.column_stack。
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
row_data = []
for i in range(2017,2019):
url='https://superstats.dk/program?aar={}%2F{}'.format(i,i+1)
html_doc = requests.get(url)
soup = BeautifulSoup(html_doc.content, "lxml")
table_div = soup.find(id="content")
rows = table_div.find_all('tr')
for row in rows:
cols=row.find_all('td')
cols=[x.text.strip() for x in cols]
if len(cols) > 0:
row_data.append(cols[0:6])
aar = soup.select_one('option[selected]')['value']
aar_start, aar_slut = aar.split("/")
aar_start_np = np.asarray(aar_start)
row_data_np = np.asarray(row_data)
new_row_final = np.insert(row_data, 0, int(aar_slut), axis=1)
print(new_row_final)
df = pd.DataFrame(new_row_final, columns=['AarStart','Dag', 'Dato', 'Hold', 'Resultat', 'Tilskuere', 'Dommer' ])
print(df)
print(new_row_final)的结果在“ AarStart”列中显示正确的结果,其中循环的第一次迭代等于2018,第二次迭代等于2019。但是print(df)仅显示整个2019。