在我的DataFrame中,我有许多相同AutoNumber
的实例具有不同的KeyValue_String
。我想将这些实例转换为单行,其中KeyValue_String
是由多个唯一值组成的列表。
AutoNumber KeyValue_String ReferralType Description
0 50899 DD 3 Web Search
1 50905 Cheque 1 Gatestone Collections
2 50906 DD 2 Centum Mortgage Brokers
3 50907 Cheque 1 Financial Debt Recovery Ltd.
4 50908 DD 2 Centum Mortgage Brokers
5 50909 DD 2 Centum Mortgage Brokers
6 50910 Cheque 1 Allied International Credit
7 50911 Cheque 1 D&A Collection Corp
8 50912 Cheque 1 Gatestone Collections
9 50913 Cheque 1 Financial Debt Recovery Ltd.
10 50914 Cheque 3 Existing Customer - Refinancing
11 50914 DD 3 Existing Customer - Refinancing
12 50915 Cheque 1 Gatestone Collections
13 50916 Cheque 3 Existing Customer - Refinancing
14 50916 Cheque 3 Existing Customer - Refinancing
所需的输出看起来像这样,除了我想保留所有其他列
AutoNumber KeyValue_String
0 50899 DD
1 50905 Cheque
2 50906 DD
3 50907 Cheque
4 50908 DD
5 50909 DD
6 50910 Cheque
7 50911 Cheque
8 50912 Cheque
9 50913 Cheque
10 50914 [Cheque, DD]
11 50915 Cheque
12 50916 Cheque
13 50917 Cheque
14 50918 Cheque
答案 0 :(得分:1)
如果我理解正确,您可以选择使用groupby
,transform
和unique
。
df['KeyValue_String'] = df.groupby('AutoNumber').KeyValue_String.transform('unique')
然后,您可以删除重复项,假设注释中提到具有相同AutoNumber的行包含除KeyValue_String之外的重复信息。
df = df.drop_duplicates(subset='AutoNumber')
我建议你是否希望数组将列中的所有内容保存为数组,并且不要花费精力将混合类型放在列中,这将更难以使用。
<强>演示强>
>>> df
AutoNumber KeyValue_String
0 50899 DD
1 50905 Cheque
2 50906 DD
3 50907 Cheque
4 50908 DD
5 50909 DD
6 50910 Cheque
7 50911 Cheque
8 50912 Cheque
9 50913 Cheque
10 50914 Cheque
11 50914 DD
12 50915 Cheque
13 50916 Cheque
14 50916 Cheque
>>> df['KeyValue_String'] = df.groupby('AutoNumber').KeyValue_String.transform('unique')
>>> df.drop_duplicates(subset='AutoNumber')
AutoNumber KeyValue_String
0 50899 [DD]
1 50905 [Cheque]
2 50906 [DD]
3 50907 [Cheque]
4 50908 [DD]
5 50909 [DD]
6 50910 [Cheque]
7 50911 [Cheque]
8 50912 [Cheque]
9 50913 [Cheque]
10 50914 [Cheque, DD]
12 50915 [Cheque]
13 50916 [Cheque]