我尝试使用循环来读取多个CSV(现在但是将来混合使用xls和xls)。
我希望pandas中的每个数据框都与我的文件夹中的文件扩展名不同。
import os
import pandas as pd
files = filter(os.path.isfile, os.listdir( os.curdir ) )
files # this shows a list of the files that I want to use/have in my directory- they are all CSVs if that matters
# i want to load these into pandas data frames with the corresponding filenames
# not sure if this is the right approach....
# but what is wrong is the variable is named 'weather_today.csv'... i need to drop the .csv or .xlsx or whatever it might be
for each_file in files:
frame = pd.read_csv( each_file)
each_file = frame
伯尼似乎很棒,但有一个问题:
or each_file in files:
frame = pd.read_csv(each_file)
filename_only = os.path.splitext(each_file)[0]
# Right below I am assigning my looped data frame the literal variable name of "filename_only" rather than the value that filename_only represents
#rather than what happens if I print(filename_only)
filename_only = frame
例如,如果我的两个文件是weather_today,earthquakes.csv(按此顺序)在我的文件列表中,那么两个地震'和'天气'不会被创建。
但是,如果我只是输入' filename_only'然后单击python中的回车键 - 然后我将看到地震数据帧。如果我有100个文件,那么列表循环中的最后一个数据框名称将标题为' filename_only'而另外99人不会因为先前的作业永远不会被作出而且第100个作品会覆盖它们。
答案 0 :(得分:2)
您可以使用os.path.splitext()
来将"路径名路径拆分为一对(root,ext),使root + ext == path,ext为空或以句点开头并包含最多一个时期。"
for each_file in files:
frame = pd.read_csv(each_file)
filename_only = os.path.splitext(each_file)[0]
filename_only = frame
正如评论中所述,我们想要一种过滤CSV文件的方法,以便您可以这样做:
files = [file for file in os.listdir( os.curdir ) if file.endswith(".csv")]
答案 1 :(得分:1)
使用字典存储您的框架:
frames = {}
for each_file in files:
frames[os.path.splitext(each_file)[0]] = pd.read_csv(each_file)
现在您可以通过以下方式获取您选择的DataFrame:
frames[filename_without_ext]
简单,对吧?但要注意RAM的使用情况,读取一堆文件会很快填满系统内存并导致崩溃。