使用唯一组合对数据进行分组

时间:2018-03-06 10:39:45

标签: python pandas

在我的下面的数据集中,我需要找到唯一的序列并为它们分配序列号..

DataSet:

user    age maritalstatus   product
A   Young   married 111
B   young   married 222
C   young   Single  111
D   old single  222
E   old married 111
F   teen    married 222
G   teen    married 555
H   adult   single  444
I   adult   single  333

独特的序列:

young   married     0
young   single      1
old     single      2
old     married     3
teen    married     4
adult   single      5

找到如上所示的唯一值后,如果我传递如下的数据帧, newdataframe

user    age maritalstatus  
A      Young   married 
X      young   Single  
D      old     single  
Z      old     married

它应该将产品作为清单返回给我。

A: [222] - as user A has already purchased 111, the matching sequence contains 222, so returns 222.
X: [111, 222]
D: [] - returns nothing, as there is only one sequence like this, and D has already purchased the product 222, so returns empty.
Z: [111] matches with sequence E, so returned 111

如果没有序列,如下面

user     age     maritalstatus  
    Y     adult  married

它应该给我一个空列表

 Y : []

1 个答案:

答案 0 :(得分:0)

你可以使用sets - 模块,它提供用于构造和操作无序的独特元素集合的类

看看: https://docs.python.org/2/library/sets.html