因此,我正在使用一组越来越多的csv建立数据集。我宁愿只创建一个函数来读取csv的列表,然后在导入时附加它们,而不是添加新的df# = pd.read_csv(filename, index...)
。有什么建议吗?我将代码放在下面,以显示当前的内容。
import glob
files = glob.glob('*.csv')
files
alg1_2018_2019 = pd.read_csv('alg1_2018_2019.csv', index_col=False)
alg1_2017_2018 = pd.read_csv('alg1_2017_2018.csv', index_col=False)
geometry_2018_2019 = pd.read_csv('geometry_2018_2019.csv', index_col=False)
geom_8_2017_2018 = pd.read_csv('geom_8_2017_2018.csv', index_col=False)
alg2_2016_2017 = pd.read_csv('alg2_2016_2017.csv', index_col=False)
alg1_2016_2017 = pd.read_csv('alg1_2016_2017.csv', index_col=False)
geom_2016_2017 = pd.read_csv('geom_2016_2017.csv', index_col=False)
geom_2015_2016 = pd.read_csv('geom_2015_2016.csv', index_col=False)
alg2_2015_2016 = pd.read_csv('alg2_2015_2016.csv', index_col=False)
alg1_part2_2015_2016 = pd.read_csv('alg1_part2_2015_2016.csv', index_col=False)```
答案 0 :(得分:1)
我正在使用以下功能:
import pandas as pd
from pathlib import Path
def glob_filemask(filemask):
"""
allows to "glob" files using file masks with full path
Usage:
for file in glob_filemask("/path/to/file_*.txt"):
# process file here
or:
files = list(glob_filemask("/path/to/file_*.txt"))
:param filemask: wildcards can be used only in the last part
(file name or extension), but NOT in the directory part
:return: Pathlib glob generator, for all matching files
Example:
glob_filemask("/root/subdir/data_*.csv") -
will return a Pathlib glob generator for all matching files
glob_filemask("/root/subdir/single_file.csv") -
will return a Pathlib glob generator for a single file
"""
p = Path(filemask)
try:
if p.is_file():
return [p]
except OSError:
return p.parent.glob(p.name)
用法:
df = pd.concat([pd.read_csv(f) for f in glob_filemask("/path/to/file_*.csv")],
ignore_index=True)