Question

我正在使用给定目录中文件列表中的生成器通过pd.concat加载数千个本应具有相同结构的文件。

我是否可以在此生成器中打印f以进行调试？我想知道哪个文件导致失败。预先谢谢大家！

files   = glob.glob(input_dir + "/*.csv")
df      = pd.concat((pd.read_csv(f) for f in all_files))

Answer 1

您可以使用try..except来正确处理文件的加载和打印潜在的错误。这是一个示例：

files   = glob.glob(input_dir + "/*.csv")

def load_file(f):
   """Loads a csv file into a dataframe"""
   try:
       # Load the file if there is no problem
       return pd.read_csv(f)
   except Exception as e:
       # If there is a problem
       # print an error message with the name of the file
       print("Loading file {} failed with error: {}"
             .format(f, e.message))
       # return an empty dataframe so the pd.concat won't fail.
       return pd.DataFrame()

df = pd.concat((load_file(f) for f in all_files))

Answer 2

我只是出于理解而使用常规循环，但是如果您坚持认为，您可能会遇到这样的肮脏黑客：

df      = pd.concat((pd.read_csv(f) for f in all_files if print(f) is None))

您没有指定会发生什么，但是如果某些事件引发了异常，则异常本身可能包含文件名，这也比常规循环要好。

在Pandas Concat pd.concat生成器中打印文件名

2 个答案: