Python / Pandas:如果Column有多个值,请转换为list中具有多个值的单行

时间:2017-02-27 17:12:09

标签: python list pandas dataframe apply

在我的DataFrame中,我有许多相同AutoNumber的实例具有不同的KeyValue_String。我想将这些实例转换为单行,其中KeyValue_String是由多个唯一值组成的列表。

    AutoNumber KeyValue_String  ReferralType                      Description
0        50899              DD             3                       Web Search
1        50905          Cheque             1            Gatestone Collections
2        50906              DD             2          Centum Mortgage Brokers
3        50907          Cheque             1     Financial Debt Recovery Ltd.
4        50908              DD             2          Centum Mortgage Brokers
5        50909              DD             2          Centum Mortgage Brokers
6        50910          Cheque             1      Allied International Credit
7        50911          Cheque             1              D&A Collection Corp
8        50912          Cheque             1            Gatestone Collections
9        50913          Cheque             1     Financial Debt Recovery Ltd.
10       50914          Cheque             3  Existing Customer - Refinancing
11       50914              DD             3  Existing Customer - Refinancing
12       50915          Cheque             1            Gatestone Collections
13       50916          Cheque             3  Existing Customer - Refinancing
14       50916          Cheque             3  Existing Customer - Refinancing

所需的输出看起来像这样,除了我想保留所有其他列

      AutoNumber KeyValue_String
0          50899            DD
1          50905        Cheque
2          50906            DD
3          50907        Cheque
4          50908            DD
5          50909            DD
6          50910        Cheque
7          50911        Cheque
8          50912        Cheque
9          50913        Cheque
10         50914    [Cheque, DD]
11         50915        Cheque
12         50916        Cheque
13         50917        Cheque
14         50918        Cheque

1 个答案:

答案 0 :(得分:1)

如果我理解正确,您可以选择使用groupbytransformunique

df['KeyValue_String'] = df.groupby('AutoNumber').KeyValue_String.transform('unique')

然后,您可以删除重复项,假设注释中提到具有相同AutoNumber的行包含除KeyValue_String之外的重复信息。

df = df.drop_duplicates(subset='AutoNumber')

我建议你是否希望数组将列中的所有内容保存为数组,并且不要花费精力将混合类型放在列中,这将更难以使用。

<强>演示

>>> df
    AutoNumber KeyValue_String
0        50899              DD
1        50905          Cheque
2        50906              DD
3        50907          Cheque
4        50908              DD
5        50909              DD
6        50910          Cheque
7        50911          Cheque
8        50912          Cheque
9        50913          Cheque
10       50914          Cheque
11       50914              DD
12       50915          Cheque
13       50916          Cheque
14       50916          Cheque

>>> df['KeyValue_String'] = df.groupby('AutoNumber').KeyValue_String.transform('unique')

>>> df.drop_duplicates(subset='AutoNumber')

    AutoNumber KeyValue_String
0        50899            [DD]
1        50905        [Cheque]
2        50906            [DD]
3        50907        [Cheque]
4        50908            [DD]
5        50909            [DD]
6        50910        [Cheque]
7        50911        [Cheque]
8        50912        [Cheque]
9        50913        [Cheque]
10       50914    [Cheque, DD]
12       50915        [Cheque]
13       50916        [Cheque]