Question

我正在尝试从列名中删除非连续的重复单词和数字。

例如我目前有df ['Weeks with 60 hours more than 60']，我想获得df ['Weeks with 60 hours more than's Weeks']

我测试过

df.columns = df.columns.str.split().apply(lambda x:OrderedDict.fromkeys(x).keys()).str.join(' ')

以下 Python Dataframe: Remove duplicate words in the same cell within a column in Python

但是出现以下错误：

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-85-1078b4f07191> in <module>()
     31     df_t.columns = df_t.columns.str.replace(r"."," ")
     32     df_t.columns = df_t.columns.str.strip()
---> 33     df_t.columns = df_t.columns.str.split().apply(lambda x:OrderedDict.fromkeys(x).keys()).str.join(' ')
     34 
     35 #     df_t.columns = df_t.columns.str.replace(r"\(.*\)","")

AttributeError: 'Index' object has no attribute 'apply'

建议？

Answer 1

使用列表理解或map：

df = pd.DataFrame(columns=['What is is name name name'])

from collections import OrderedDict
df.columns = [' '.join(OrderedDict.fromkeys(x).keys()) for x in df.columns.str.split()]
print (df)
Empty DataFrame
Columns: [What is name]
Index: []

df.columns = (df.columns.str.split()
                .map(lambda x:OrderedDict.fromkeys(x).keys())
                .str.join(' '))

从df.columns中删除不连续的重复项

1 个答案: