Question

我正在尝试导入目录中的所有.csv文件。我想将它们存储在每个文件的数组中（例如，命名为file_name）。我按照线程import all csv files in directory as pandas dfs and name them as csv filenames中的建议尝试了以下代码：

import pandas as pd
import glob
import os

path = "E:\\9sem\\INO\\Dane\\input\\"
all_files = glob.glob(os.path.join(path, "*.csv")) #make list of paths

for file in all_files:
    # Getting the file name without extension
    file_name = os.path.splitext(os.path.basename(file))[0]
    # Reading the file content to create a DataFrame
    dfn = pd.read_csv(file)
    # Setting the file name (without extension) as the index name
    dfn.index.name = file_name

我被困住了。我将数据导入到单个DataFrame中，但是我不知道如何将其转换成单独的numpy数组。

谢谢您的任何建议。

最好的问候，马克

Answer 1

您的代码将始终用下一个csv的数据覆盖数据帧，对吗？

因此，您可以使用pandas.concat来制作一个大数据框，也可以将数据存储在字典中。如果要将其存储在字典中，则可以这样更改代码：

df_dict= dict()
for file in all_files:
    # Getting the file name without extension
    file_name = os.path.splitext(os.path.basename(file))[0]
    # Reading the file content to create a DataFrame
    df_dict[file_name]= pd.read_csv(file)
    # Setting the file name (without extension) as the index name
    df_dict[file_name].index.name = file_name

然后，您可以通过df_dict[base_name]获取数据帧。其中base_name是数据框的源文件的名称。

将指定目录中的所有.csv文件导入到单独的数组中

1 个答案: