我正在尝试从列名中删除非连续的重复单词和数字。
例如我目前有df ['Weeks with 60 hours more than 60'],我想获得df ['Weeks with 60 hours more than's Weeks']
我测试过
df.columns = df.columns.str.split().apply(lambda x:OrderedDict.fromkeys(x).keys()).str.join(' ')
以下 Python Dataframe: Remove duplicate words in the same cell within a column in Python
但是出现以下错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-85-1078b4f07191> in <module>()
31 df_t.columns = df_t.columns.str.replace(r"."," ")
32 df_t.columns = df_t.columns.str.strip()
---> 33 df_t.columns = df_t.columns.str.split().apply(lambda x:OrderedDict.fromkeys(x).keys()).str.join(' ')
34
35 # df_t.columns = df_t.columns.str.replace(r"\(.*\)","")
AttributeError: 'Index' object has no attribute 'apply'
建议?
答案 0 :(得分:1)
使用列表理解或map
:
df = pd.DataFrame(columns=['What is is name name name'])
from collections import OrderedDict
df.columns = [' '.join(OrderedDict.fromkeys(x).keys()) for x in df.columns.str.split()]
print (df)
Empty DataFrame
Columns: [What is name]
Index: []
df.columns = (df.columns.str.split()
.map(lambda x:OrderedDict.fromkeys(x).keys())
.str.join(' '))