使用python / pandas组合一列中存在重复值的行

时间:2018-04-05 23:33:35

标签: python pandas

我想使用python来确定" Id"中是否有ID值的第一个实例。 column在同一列的后一行中匹配。如果确实如此,那么我想从" Avail"中获取价值。匹配该初始" Id"的行的列值。然后我想删除带有重复ID的行。

这是我的示例数据: 我有一个包含以下数据的CSV文件:

Id,First,Last,Avail  
abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate  
dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate  
gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate  
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-5a10b9261b60=immediate  
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-c5dfe3b1276c=relative|7 

所需的输出(v1)(请注意,我不关心重复行中的" First"或#34; Last"列。我只关心& #34;可用的数据来自那些:

Id,First,Last,Avail  
abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate  
dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate  
gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate  
abcdefg,Nancy,Adams,3ec0c158-8782-41ff-8388-5a10b9261b60=immediate  
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-c5dfe3b1276c=relative|7  

然后我想删除"复制"行,留下这个:

Id,First,Last,Avail  
    abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate  
    dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate  
    gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate 

1 个答案:

答案 0 :(得分:0)

import pandas as pd

df = pd.DataFrame(data=[
        [1, 'John', 'Smith', 'a'],
        [1, 'John', 'Smith', 'b'],
        [2, 'Kate', 'Smith', 'c'],
    ],
    columns=['ID', 'First', 'Last', 'Avail']
)

output = (df
          .groupby(['ID', 'First', 'Last'], as_index=False)
          .agg({'Avail': lambda x: ';'.join(x)}))

你可以使用groupby作为@Sphinx建议。您请求的输出样式的示例如上所示。