我正在将熊猫的csv文件加载为
premier10 = pd.read_csv('./premier_league/pl_09_10.csv')
但是,我有20多个csv文件,我希望使用循环和预定义名称将它们作为单独的dfs(每个csv一个df)加载,类似于:
import pandas as pd
file_names = ['pl_09_10.csv','pl_10_11.csv']
names = ['premier10','premier11']
for i in range (0,len(file_names)):
names[i] = pd.read_csv('./premier_league/{}'.format(file_names[i]))
(注意,这里我仅提供两个csv文件作为示例),不幸的是,这不起作用(没有错误消息,但pd dfs不存在)。
由于我在Stackoverflow上没有发现任何类似的东西,因此对以前问题的任何提示/链接将不胜感激。
答案 0 :(得分:1)
pathlib
设置文件的路径p
.glob
方法查找与模式匹配的文件pandas.read_csv
创建数据框
pandas.concat
结合使用,从所有文件中创建一个数据框。for-loop
中,可能无法以这种方式创建对象(变量)(例如names[i]
)。
'premier10' = pd.read_csv(...)
,其中'premier10'
是str
类型。from pathlib import Path
import pandas as pd
# set the path to the files
p = Path('some_path/premier_league')
# create a list of the files matching the pattern
files = list(p.glob(f'pl_*.csv'))
# creates a dict of dataframes, where each file has a separate dataframe
df_dict = {f.stem: pd.read_csv(f) for f in files}
# alternative, creates 1 dataframe from all files
df = pd.concat([pd.read_csv(f) for f in files])
答案 1 :(得分:0)
names = ['premier10','premier11']
不会创建字典,而是创建列表。只需将其替换为names = dict()
或将names = ['premier10','premier11']
替换为names.append(['premier10','premier11'])
答案 2 :(得分:0)
这就是您想要的:
#create a variable and look through contents of the directory
files=[f for f in os.listdir("./your_directory") if f.endswith('.csv')]
#Initalize an empty data frame
all_data = pd.DataFrame()
#iterate through files and their contents, then concatenate their data into the data frame initialized above
for file in files:
df = pd.read_csv('./your_directory' + file)
all_data = pd.concat([all_data, df])
#Call the new data frame and verify that contents were transferred
all_data.head()