Question

我想从Pandas DataFrame的每一行中对给定查询的单词进行排序，然后从中删除重复项。如何分别在每一行上执行此任务，如：给定DataFrame：

Sr.No | Query
-------------
1.    war gears of war
2.    call of duty
3.    legend of troy legend
4.    resident evil

结果DataFrame应该是：

Sr.No | Query
-------------
1.    gears of war
2.    call duty of 
3.    legend of troy
4.    evil resident

我正在使用split函数首先拆分数据框每行的单词，但它不起作用。

for i in range(0,42365):
    temp2.iloc[[i]]=list(str(temp2.iloc[[i]]).split())
    print(temp2.iloc[[i]])

我收到以下错误：

无法使用长度与值不同的类似列表的索引器进行设置。

Answer 1

设置

df = pd.DataFrame([
        ['war gears of war'],
        ['call of duty'],
        ['legend of troy legend'],
        ['resident evil'],  
    ], pd.Index(['1.', '2.', '3.', '4.'], name='Sr.No'), ['Query'])

df

解决方案

df.Query.str.split().apply(lambda x: sorted(set(x))).str.join(' ').to_frame()

Answer 2

您可以先使用split和stack创建Series：

s = df.col.str.split(expand=True).stack()
print (s)
0  0         war
   1       gears
   2          of
   3         war
1  0        call
   1          of
   2        duty
2  0      legend
   1          of
   2        troy
   3      legend
3  0    resident
   1        evil
dtype: object

然后groupby为第一级，并将sort_values与drop_duplicates一起使用。最后join所有字词：

print (s.groupby(level=0).apply(lambda x: ' '.join(x.sort_values().drop_duplicates())))
0      gears of war
1      call duty of
2    legend of troy
3     evil resident
dtype: object

按字母顺序对查询中的单词进行排序，并从各行中删除重复的单词

2 个答案:

设置

解决方案