Question

我有以下df：

                          test1     test2     test3
water(h20)                  ok         x         x
carbon dioxide (co2)         x         x         x
Silicon                     ok        ok        ok

我可以通过删除括号和内部的所有内容来清理df的索引吗？

期望的输出：

                    test1     test2     test3
water                 ok         x         x
carbon dioxide         x         x         x
Silicon               ok        ok        ok

我试过这段代码：

new_df=df.index.map(lambda x:str(x)[:-5])

并且有效，但不区分这些索引名称 - 没有括号（silicon），这是我面临的主要问题，

Answer 1

您可以使用正则表达式str.replace - \s*选择空格（*表示0或更多空格），然后选择()的内容并替换它通过空白空间：

print (df.index.str.replace('\s*\((.*)\)', ''))
Index(['water', 'carbon dioxide', 'Silicon'], dtype='object')

df.index = df.index.str.replace('\s*\((.*)\)', '')
print (df)
               test1 test2 test3
water             ok     x     x
carbon dioxide     x     x     x
Silicon           ok    ok    ok

此外，如果需要从第一个(替换所有内容，最后只删除\)：

print (df)
                     test1 test2 test3
water(h20) ee           ok     x     x
carbon dioxide (co2)     x     x     x
Silicon                 ok    ok    ok

df.index = df.index.str.replace('\s*\((.*)', '')
print (df)
               test1 test2 test3
water             ok     x     x
carbon dioxide     x     x     x
Silicon           ok    ok    ok

Answer 2

另一种方式

In [961]: df.index = df.index.str.split('(').str[0]

In [962]: df
Out[962]:
                test1 test2 test3
water              ok     x     x
carbon dioxide      x     x     x
Silicon            ok    ok    ok

清理我的数据框中的索引名称

2 个答案: