我一直在寻找解决问题的方法,但是我找到的所有答案都在答案的末尾使用了print(),而不是按照我的意愿保存数据帧。
下面,我有一个(几乎)功能代码,可以打印3个单独的表。如何将这三个表保存在3个单独的数据框中,名称分别为matchs_october,matches_november和matches_december?
代码的最后一行无法正常运行,因为我希望它能够正常工作。我希望我想清楚代码要做什么(在循环的3个回合的每个循环的末尾保存一个数据帧)
OuterRef
答案 0 :(得分:2)
您可以设置大小写,但这不是很可靠(而且很丑陋)。
if i == 'october':
matches_october = pd.read_html(str(table))
if i == 'november':
# so on and so forth
一个更优雅的解决方案是使用字典。在循环之前,声明matches = {}
。然后,在每次迭代中:
matches[i] = pd.read_html(str(table))
然后,您可以通过matches['october']
访问十月份的比赛数据框。
答案 1 :(得分:1)
您不能使用+
来组成变量名,而应尝试使用dict
:
import pandas as pd
import requests
from bs4 import BeautifulSoup
matches = {} # create an empty dict
base_url = 'https://www.basketball-reference.com/leagues/NBA_2019_games-'
valid_pages = ['october','november','december']
end = '.html'
for i in valid_pages:
url = '{}{}{}'.format(base_url, i, end)
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
print(df)
matches[i] = df[0] # store it in the dict
答案 2 :(得分:0)
谢谢大家。可行! :)
import pandas as pd
import requests
from bs4 import BeautifulSoup
matches = {} # create an empty dict
base_url = 'https://www.basketball-reference.com/leagues/NBA_2019_games-'
valid_pages = ['october','november','december']
end = '.html'
for i in valid_pages:
url = '{}{}{}'.format(base_url, i, end)
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
matches[i] = df[0] # store it in the dict
matches_october = matches['october']