Question

我尝试使用循环来读取多个CSV（现在但是将来混合使用xls和xls）。

我希望pandas中的每个数据框都与我的文件夹中的文件扩展名不同。

import os 
import pandas as pd


files = filter(os.path.isfile, os.listdir( os.curdir ) )
files #   this shows a list of the files that I want to use/have in my directory- they are all CSVs if that matters

# i want to load these into pandas data frames with the corresponding filenames

 # not sure if this is the right approach....
 # but what is wrong is the variable is named 'weather_today.csv'... i need to drop the .csv or .xlsx or whatever it might be

for each_file in files:
    frame = pd.read_csv( each_file)
    each_file = frame

伯尼似乎很棒，但有一个问题：

or each_file in files:
    frame = pd.read_csv(each_file)
    filename_only = os.path.splitext(each_file)[0]
   # Right below I am assigning my looped data frame the literal variable name of "filename_only" rather than the value that filename_only represents
   #rather than what happens if I print(filename_only)
    filename_only = frame

例如，如果我的两个文件是weather_today，earthquakes.csv（按此顺序）在我的文件列表中，那么两个地震＆＃39;和＆＃39;天气＆＃39;不会被创建。

但是，如果我只是输入＆＃39; filename_only＆＃39;然后单击python中的回车键 - 然后我将看到地震数据帧。如果我有100个文件，那么列表循环中的最后一个数据框名称将标题为＆＃39; filename_only＆＃39;而另外99人不会因为先前的作业永远不会被作出而且第100个作品会覆盖它们。

Answer 1

您可以使用os.path.splitext()来将＆＃34;路径名路径拆分为一对（root，ext），使root + ext == path，ext为空或以句点开头并包含最多一个时期。＆＃34;

for each_file in files:
    frame = pd.read_csv(each_file)
    filename_only = os.path.splitext(each_file)[0]
    filename_only = frame

正如评论中所述，我们想要一种过滤CSV文件的方法，以便您可以这样做：

files = [file for file in os.listdir( os.curdir ) if file.endswith(".csv")]

Answer 2

使用字典存储您的框架：

frames = {}

for each_file in files:
    frames[os.path.splitext(each_file)[0]] = pd.read_csv(each_file)

现在您可以通过以下方式获取您选择的DataFrame：

frames[filename_without_ext]

简单，对吧？但要注意RAM的使用情况，读取一堆文件会很快填满系统内存并导致崩溃。

将当前工作目录中的所有CSV文件读入具有正确文件名的pandas

2 个答案: