Question

我有一个带有大型multiindex的数据框，该数据框来自大量的csv文件。这些文件中的某些文件在各个标签中都存在错误，即。 “窗口”的拼写错误为“ winZZw”，当我用df.xs('window', level='middle', axis=1)选择所有窗口时，这会导致问题。

因此，我需要一种简单地将winZZw替换为window的方法。

这是一个非常小的df示例：（让我们假设数据和'roof', 'window'…字符串来自某个复杂的文本阅读器）

header = pd.MultiIndex.from_product(['roof', 'window', 'basement'], names = ['top', 'middle', 'bottom'])
dates = pd.date_range('01/01/2000','01/12/2010', freq='MS')
data = np.random.randn(len(dates))
df = pd.DataFrame(data, index=dates, columns=header)
header2 = pd.MultiIndex.from_product(['roof', 'winZZw', 'basement'], names = ['top', 'middle', 'bottom'])
data = 3*(np.random.randn(len(dates)))
df2 = pd.DataFrame(data, index=dates, columns=header2)
df = pd.concat([df, df2], axis=1)
header3 = pd.MultiIndex.from_product(['roof', 'door', 'basement'], names = ['top', 'middle', 'bottom'])
data = 2*(np.random.randn(len(dates)))
df3 = pd.DataFrame(data, index=dates, columns=header3)
df = pd.concat([df, df3], axis=1)

现在，我想为所有具有中间窗口的房屋 xs新建一个数据框：windf = df.xs('window', level='middle', axis=1)

但这显然错过了拼写错误的winZZw。

那么，如何将winZZw替换为window？

我发现的唯一方法是使用set_levels，但是，如果我正确理解这一点，则需要将其整个级别都填满，即

df.columns.set_levels([u'window',u'window', u'door'], level='middle',inplace=True)

但这有两个问题：

我需要将整个索引传递给它，在此示例中这很容易，但是对于具有数百个标签的一千列df来说是不可能/愚蠢的。
似乎需要向后列表（现在，我在df中的第一个条目的中间是门，而不是它的窗户）。这可能可以解决，但看起来很奇怪

我可以通过xs仅创建winZZw的新df，然后使用set_levels(df.shape[1]*[u'window'], level='middle')设置级别，然后再次将其协调起来来解决这些问题，但是我想希望有一个更简单的str.replace('winZZw', 'window')类似物，但我不知道怎么做。

Answer 1

使用rename指定级别：

header = pd.MultiIndex.from_product([['roof'],[ 'window'], ['basement']], names = ['top', 'middle', 'bottom'])
dates = pd.date_range('01/01/2000','01/12/2010', freq='MS')
data = np.random.randn(len(dates))
df = pd.DataFrame(data, index=dates, columns=header)
header2 = pd.MultiIndex.from_product([['roof'], ['winZZw'], ['basement']], names = ['top', 'middle', 'bottom'])
data = 3*(np.random.randn(len(dates)))
df2 = pd.DataFrame(data, index=dates, columns=header2)
df = pd.concat([df, df2], axis=1)
header3 = pd.MultiIndex.from_product([['roof'], ['door'], ['basement']], names = ['top', 'middle', 'bottom'])
data = 2*(np.random.randn(len(dates)))
df3 = pd.DataFrame(data, index=dates, columns=header3)
df = pd.concat([df, df3], axis=1)

df = df.rename(columns={'winZZw':'window'}, level='middle')
print(df.head())

top             roof                    
middle        window                door
bottom      basement  basement  basement
2000-01-01 -0.131052 -1.189049  1.310137
2000-02-01 -0.200646  1.893930  2.124765
2000-03-01 -1.690123 -2.128965  1.639439
2000-04-01 -0.794418  0.605021 -2.810978
2000-05-01  1.528002 -0.286614  0.736445

如何替换熊猫多索引中的字符串？

1 个答案: