我有一个包含57个.csv文件的数据集。我想在一个变量(称为FOS)中阅读它们。因此FOS必须是一个数组。但是,如何使用Pandas将这些.csv文件加载到数组中?此外,还有一些丢失的文件...
我试图建立一个for循环,并希望将每个文件放在数组的特定位置。就像FOS [0]上的File_1.csv ist和FOS [57]上的File_57一样。
FOS=[]
for i in range(1,57):
if i != 5: # Because Filename_5 is missing in the dataset...
FOL[i]=pd.read_csv("Path\Filename{0}.csv".format(i), some more parameters like name)
但是现在我得到了错误:“ IndexError:列表分配索引超出范围”
答案 0 :(得分:0)
您可以做一些简短的事情,例如:
import os
FOS=[pd.read_csv(f"Path/Filename{i}.csv")
for i in range(1,57)
if os.path.exists(f"Path/Filename{i}.csv")
]
说明:
这使用列表理解。这意味着表达式[....]
构造了列表。等效于wrting:
EOS= list()
for i in range(1,57):
if os.path.exists(f"Path/Filename{i}.csv"):
EOS.append(pd.read_csv(f"Path/Filename{i}.csv"))
if os.path.exists(f"Path/Filename{i}.csv")
的动态性比排除文件5的动态性更高。如果您更频繁地执行此操作,并且输入文件有所不同,则更加方便。但是也许在这种情况下,您应该阅读文件列表(例如,使用os.listdir
)。
答案 1 :(得分:0)
您可以使其更具动态性。首先将所有需要读取的文件移到一个目录中。现在,如果您有子目录,请使用os module
遍历并获取所有文件路径。
import os
import pandas as pd
def _fetch_file_locations(root_path: str, extension: str) -> iter:
"""
This function reads all files of a particular extension. It traverses
through sub directories and finds all files
:param root_path: the path from where it needs to start looking for files
:param extension: the extension of the file that it's looking for
:return: the array of file paths of all the files that were found
"""
if not os.path.isdir(root_path):
raise NotADirectoryError(f'There is no directory at path: {root_path}')
file_collection = []
file_collection += [os.path.join(root, file) for root, dirs, files in os.walk(root_path)
for file in files if extension in file]
return file_collection
def main(root_path: str):
all_files = _fetch_file_locations(root_path, extension='.csv')
# uses pandas to read all he CSV files and convert the dataframe to an array of dictionary
file_contents = [pd.read_csv(file_path).to_dict('record') for file_path in all_files]
# converts the array of arrays into a single array of dicts
all_contents_in_one = [record for content in file_contents for record in content]
print(f"Found {len(all_contents_in_one)} records after merging {len(all_files)}")
if __name__ == '__main__':
main(r'X:\worky')