我想使用python来确定" Id"中是否有ID值的第一个实例。 column在同一列的后一行中匹配。如果确实如此,那么我想从" Avail"中获取价值。匹配该初始" Id"的行的列值。然后我想删除带有重复ID的行。
这是我的示例数据: 我有一个包含以下数据的CSV文件:
Id,First,Last,Avail
abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate
dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate
gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-5a10b9261b60=immediate
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-c5dfe3b1276c=relative|7
所需的输出(v1)(请注意,我不关心重复行中的" First"或#34; Last"列。我只关心& #34;可用的数据来自那些:
Id,First,Last,Avail
abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate
dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate
gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate
abcdefg,Nancy,Adams,3ec0c158-8782-41ff-8388-5a10b9261b60=immediate
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-c5dfe3b1276c=relative|7
然后我想删除"复制"行,留下这个:
Id,First,Last,Avail
abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate
dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate
gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate
答案 0 :(得分:0)
import pandas as pd
df = pd.DataFrame(data=[
[1, 'John', 'Smith', 'a'],
[1, 'John', 'Smith', 'b'],
[2, 'Kate', 'Smith', 'c'],
],
columns=['ID', 'First', 'Last', 'Avail']
)
output = (df
.groupby(['ID', 'First', 'Last'], as_index=False)
.agg({'Avail': lambda x: ';'.join(x)}))
你可以使用groupby作为@Sphinx建议。您请求的输出样式的示例如上所示。