Question

我正在重新索引多个文件夹中的文件。文件最初看起来像这样：

Combined   Percent
0101       50
0102       25
0104       25

然后我使用此代码创建一个新索引，它是文件夹中所有文件的索引的并集：

import pandas as pd
from glob import glob 

folders=(r'C:\pathway_to_folders')
for folder in os.listdir(folders): 
    path=os.path.join(folders,folder)
    filenames=glob(os.path.join(path+'/*.csv'))
    def rfile(fn):
        return pd.read_csv(fn, dtype='str', index_col=0)
    dfs = [rfile(fn) for fn in filenames]
    idx = dfs[0].index
    for i in range(1, len(dfs)):
        idx = idx.union(dfs[i].index)
    print idx

当我将列Combined设置为索引列时，dfs现在看起来像这样：

Combined   Percent
101        50
102       25
104       25

有没有办法让索引的格式与原始列保持一致，或者操纵我的代码而不必设置索引？

Answer 1

我认为这仍然是一个长期存在的错误，您无法设置dtype并指定与索引列相同的列，您必须执行此操作作为辅助步骤：

def rfile(fn):
    return pd.read_csv(fn, dtype=str).set_index('Combined')

将列设置为索引时Dtype更改

1 个答案: