Question

我有这样的df：

df = pd.DataFrame( {"Client" : ["Alice", "Nan", "Nan", "Mallory", "Nan" , "Bob"] , 
    "Product" : ["A", "B", "C", "B", "C", "B"] } )

我想达到这样的结果：

Alice   A, B, C
Mallory B, C
Bob     B

有人知道如何使用python 3做到这一点吗？

Answer 1

您可以在分组后对agg个项目执行join功能

熊猫达0.25+

df = df.replace("Nan",np.NaN).ffill()
df.groupby('Client', sort=False)['Product'].agg(Product=('Product',','.join)).reset_index()

熊猫低于0.25

df=df.replace("Nan",np.NaN).ffill()
df.groupby('Client', sort=False)['Product'].agg([('Product', ','.join)]).reset_index()

输出

    Client  Product
0   Alice   A,B,C
1   Mallory B,C
2   Bob     B

Answer 2

您似乎具有groupby操作的输出（数据所在的“ Nan”所在的位置），您将需要将其放回groupby状态以对其进行任何有用的操作。

首先将字符串“ Nan”转换为实际的NaN。

import numpy as np
df.replace("Nan", np.NaN, inplace=True)

然后填充就可以了。

df.ffill(axis=0, inplace=True)

然后获取输出的格式：（这是发生魔术的地方）

for group, data in df.groupby(df.Client): 
    print(group, data.Product.tolist())

Alice ['A', 'B', 'C']
Bob ['B']
Mallory ['B', 'C']

我将离开家庭，处理f字符串格式。

Answer 3

这样的事情怎么样？

import pandas as pd
from collections import defaultdict

df = pd.DataFrame( {"Client" : ["Alice", "Nan", "Nan", "Mallory", "Nan" , "Bob"] , 
    "Product" : ["A", "B", "C", "B", "C", "B"] } )

last_client = None
data = defaultdict(list)
for _, row in df.iterrows():
    # id hazard a guess you want np.nan not the string compare here
    if row.Client != last_client and row.Client != "Nan":
        last_client = row.Client
    data[last_client].append(row.Product)

print(data)

defaultdict（，{'Alice'：['A'，'B'，'C']，'Mallory'：['B'，'C']，'Bob'：['B']}）

熊猫将列值设置为行

3 个答案: