Question

我想使用python来确定＆＃34; Id＆＃34;中是否有ID值的第一个实例。 column在同一列的后一行中匹配。如果确实如此，那么我想从＆＃34; Avail＆＃34;中获取价值。匹配该初始＆＃34; Id＆＃34;的行的列值。然后我想删除带有重复ID的行。

这是我的示例数据：我有一个包含以下数据的CSV文件：

Id,First,Last,Avail  
abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate  
dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate  
gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate  
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-5a10b9261b60=immediate  
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-c5dfe3b1276c=relative|7

所需的输出（v1）（请注意，我不关心重复行中的＆＃34; First＆＃34;或＃34; Last＆＃34;列。我只关心＆＃34;可用的数据来自那些：

Id,First,Last,Avail  
abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate  
dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate  
gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate  
abcdefg,Nancy,Adams,3ec0c158-8782-41ff-8388-5a10b9261b60=immediate  
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-c5dfe3b1276c=relative|7

然后我想删除＆＃34;复制＆＃34;行，留下这个：

Id,First,Last,Avail  
    abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate  
    dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate  
    gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate

Answer 1

import pandas as pd

df = pd.DataFrame(data=[
        [1, 'John', 'Smith', 'a'],
        [1, 'John', 'Smith', 'b'],
        [2, 'Kate', 'Smith', 'c'],
    ],
    columns=['ID', 'First', 'Last', 'Avail']
)

output = (df
          .groupby(['ID', 'First', 'Last'], as_index=False)
          .agg({'Avail': lambda x: ';'.join(x)}))

你可以使用groupby作为@Sphinx建议。您请求的输出样式的示例如上所示。

使用python / pandas组合一列中存在重复值的行

1 个答案: