我有一个包含许多csv文件的目录,我已将其加载到数据帧字典
中所以,只需要3个示例小csv文件来说明
import os
import csv
import pandas as pd
#create 3 small csv files for test purposes
os.chdir('c:/test')
with open('dat1990.csv','w',newline='') as fp:
a=csv.writer(fp,delimiter=',')
data = [['Stock','Sales','Year'],
['100','24','1990'],
['120','33','1990'],
['23','5','1990']]
a.writerows(data)
with open('dat1991.csv','w',newline='') as fp:
a=csv.writer(fp,delimiter=',')
data = [['Stock','Sales','Year'],
['400','35','1991'],
['450','55','1991'],
['34','6','1991']]
a.writerows(data)
with open('other1991.csv','w',newline='') as fp:
a=csv.writer(fp,delimiter=',')
data = [['Stock','Sales','Year'],
['500','56','1991'],
['600','44','1991'],
['56','55','1991']]
a.writerows(data)
创建一个字典,用于将csv文件处理为数据帧
dfcsv_dict = {'dat1990': 'dat1990.csv', 'dat1991': 'dat1991.csv',
'other1991': 'other1991.csv'}
创建一个简单的导入功能,用于将csv导入pandas
def myimport(csvfile):
return pd.read_csv(csvfile)
遍历字典将所有csv文件导入pandas dataframes
df_dict = {}
for k, v in dfcsv_dict.items():
df_dict[k] = myimport(v)
鉴于我现在可能在统一字典对象中有数千个数据帧,我如何选择一些并从字典中“提取”它们?
例如,我如何只提取嵌套在字典中的这三个数据帧中的两个,如
dat1990 = df_dict['dat1990']
dat1991 = df_dict['dat1991']
但不使用文字作业。也许在字典上有某种循环结构,希望能够根据字典键中的字符串序列选择子组: 例如,所有名为 dat 或 1991 等的数据帧
我不想要另一个“子词典”,但是想要将它们提取为名为“独立”的数据帧,如上面的代码所示。
我正在使用python 3.5。
答案 0 :(得分:0)
这是2016年1月以来的老问题,但由于没有人回答,因此是2019年10月以来的答案。可能对将来的参考很有用。
我认为您可以跳过创建数据框字典的步骤。之前,我就如何从多个CSV文件创建单个主数据帧以及如何在主数据帧中添加从CSV文件名提取的字符串的列,写了一个答案。我认为您基本上可以在这里做同样的事情。
Create a dataframe of csv files based on timestamp intervals
步骤:
import pandas as pd
import os
# Step 1: create a path to the folder, syntax for Windows OS
path_test_folder = 'C:\\test\\'
# Step 2: create a list of CSV files in the folder
files_in_folder = os.listdir(path_test_folder)
files_in_folder = [x for x in files_in_folder if '.csv' in x]
# Step 3: create empty master dataframe to store CSV files
df_master = pd.DataFrame()
# Step 4: loop through the files in folder
for each_csv in files_in_folder:
# temporary dataframe for the CSV
path_csv = os.path.join(path_test_folder, each_csv)
temp_df = pd.read_csv(path_csv)
# add folder with filename
temp_df['str_filename'] = str(each_csv)
# combine into master dataframe
df_master = pd.concat([df_master, temp_df])
# then filter on your filenames
mask_filter = df_master['str_filename'].isin(['dat1990.csv', 'dat1991.csv'])
df_filter = df_master.loc[mask_filter]