Question

我想提供一个空的dataframe附加几个相同类型和结构的文件。但是，我不明白这里有什么问题：

def files2df(colnames, ext):
    df = DataFrame(columns = colnames)
    for inf in sorted(glob.glob(ext)):
        dfin = read_csv(inf, sep='\t', skiprows=1)
        print(dfin.head(), '\n')
        df.append(dfin, ignore_index=True)
    return df

结果数据框为空。有人可以帮我一把吗？

    1.0  16.59  0.597  0.87  1.0.1   3282 100.08
 0  0.953  14.52  0.561  0.80   0.99   4355      -
 1  1.000  31.59  1.000  0.94   1.00   6322      -
 2  1.000   6.09  0.237  0.71   1.00  10568      -
 3  1.000  31.29  1.000  0.94   1.00  14363      -
 4  1.000  31.59  1.000  0.94   1.00  19797      - 

      1.0   6.69  0.199  0.74  1.0.1   186 13.16
 0      1   0.88  0.020  0.13   0.99   394     -
 1      1   0.75  0.017  0.11   0.99  1052     -
 2      1   3.34  0.097  0.57   1.00  1178     -
 3      1   1.50  0.035  0.26   1.00  1211     -
 4      1  20.59  0.940  0.88   1.00  1583     - 

      1.0  0.12  0.0030  0.04  0.97   2285 2.62
 0     1  1.25   0.135  0.18  0.99   2480    -
 1     1  0.03   0.001  0.04  0.97   7440    -
 2     1  0.12   0.003  0.04  0.97   8199    -
 3     1  1.10   0.092  0.16  0.99  11174    -
 4     1  0.27   0.007  0.06  0.98  11310    - 

   0.244  0.07  0.0030  0.02  0.76  41314 1.32
 0  0.181  0.64   0.028  0.03  0.36  41755    -
 1  0.161  0.18   0.008  0.01  0.45  42420    -
 2  0.161  0.18   0.008  0.01  0.45  42461    -
 3  0.237  0.25   0.011  0.02  0.56  43060    -
 4  0.267  1.03   0.047  0.07  0.46  43321    - 

 0.163  0.12  0.0060  0.01   0.5  103384 1.27
 0  0.243  0.27   0.014  0.02  0.56  104693    -
 1  0.215  0.66   0.029  0.04  0.41  105192    -
 2  0.190  0.10   0.005  0.01  0.59  105758    -
 3  0.161  0.12   0.006  0.01  0.50  109783    -
 4  0.144  0.16   0.007  0.01  0.42  110067    - 

Empty DataFrame
Columns: array([D, LOD, r2, CIlow, CIhi, Dist, T-int], dtype=object)
Index: array([], dtype=object)

Answer 1

df.append（dfin，ignore_index = True）返回一个新的DataFrame，它不会改变df。使用df = df.append（dfin，ignore_index = True）。但即使有这种变化，我认为这不会给你所需要的东西。追加在轴= 1（列）上扩展一个框架，但我相信你想要在轴上组合数据= 0（行）

在这种情况下（读取多个文件并使用所有数据创建单个DataFrame），我会使用pandas.concat（）。下面的代码将为您提供一个包含colnames命名列的框架，这些行由csv文件中的数据组成。

def files2df(colnames, ext):
    files = sorted(glob.glob(ext))
    frames = [read_csv(inf, sep='\t', skiprows=1, names=colnames) for inf in files]
    return concat(frames, ignore_index=True)

我没有尝试这个代码，只是在这里写了，也许你需要调整它才能让它运行，但这个想法很明确（我希望）。

Answer 2

另外，我找到了另一个解决方案，但不知道哪个更快。

def files2df(colnames, ext):
    dflist = [ ]
    for inf in sorted(glob.glob(ext)):
        dflist.append(read_csv(inf, names = colnames, sep='\t', skiprows=1))
        #print(dflist)                                                                                                                            
    df = concat(dflist, axis = 0, ignore_index=True)
    #print(df.to_string())                                                                                                                        
    return df

用几个文件提供空的pandas.dataframe

2 个答案: