Question

假设我下面有一个数据框。

       a        b        c
0    one      two    three
1  three      one      two

我想将第0行和第1行视为同一列表？之类的东西，因为即使顺序不同，两行都包含“一个”，“两个”，“三个”。

我应该新建一个列来存储a，b，c列中的所有字符串，例如，

       a        b        c                d
0    one      two    three    one two three
1  three      one      two    three one two

然后比较d列的第0行和第1行？

此后，我要执行.groupby（'d'），因此，不能将“一二三”和“三一二”分开。

我想不出办法解决这个问题，需要帮助。

Answer 1

您创建的新列应为tuple，因为列表不可散列（groupby将失败）。因此，我们首先使用tolist()创建列，然后对其进行排序并将transform排序为tuple。

设置

import pandas as pd

data = {'a': ['one', 'three'], 'b': ['two', 'one'], 'c': ['three', 'two']}
df = pd.DataFrame(data)

排序和转换...

df['d'] = df.values.tolist()
df['d'] = (    
     df['d'].transform(sorted)
         .transform(tuple)
)
print(df.groupby('d').sum()) # I'm calling sum() just to show groupby working 

# prints only one group:
#                           a       b         c
# d
# (one, three, two)  onethree  twoone  threetwo

Answer 2

在加入以创建分组字符串之前，对每行中的单元格进行排序。

使用apply with axis = 1可以逐行应用此功能。

df['d'] = df.apply(lambda x: ' '.join(x.sort_values()), axis=1)

# outputs:

       a    b      c              d
0    one  two  three  one three two
1  three  one    two  one three two

按d分组将把两行放在同一组中。例如：

df.groupby('d').agg('count')

               a  b  c
d
one three two  2  2  2

Python-检查几列并比较字符串

2 个答案: