Question

我需要根据特定条件从下面格式化一个新表。每个电子邮件列都分组在一起。不确定从哪里开始。

email              node_id  title
test@gmail.com     123      Some, text 1 
test@gmail.com     456      Some, text 2
test@gmail.com     789      Some, text 3
example@gmail.com  123      Some, text 1
example@gmail.com  767      Some, text 4
example@gmail.com  122      Some, text 5

进入：

email              n1   t1             n2      t2           n3     t3
test@gmail.com     123  Some,text 1   456   Some,text 2    789     Some, text 3 
example@gmail.com  123  Some,text 1   767   Some,text 4    122     Some, test 5

Answer 1

使用cumcount分配一列，以便您可以pivot对其进行命名，然后重命名这些列：

res = (df.assign(no=df.groupby("email")["node_id"].cumcount()+1)
         .pivot(index="email", columns="no", values=["node_id", "title"]))

res.columns = [x+str(y) for x in ("n", "t") for y in range(1, 4)]

print (res)

                    n1   n2   n3            t1            t2            t3
email                                                                     
example@gmail.com  123  767  122  Some, text 1  Some, text 4  Some, text 5
test@gmail.com     123  456  789  Some, text 1  Some, text 2  Some, text 3

Answer 2

不确定这是否有帮助

import pandas as pd

df = pd.DataFrame(
    [
        ['test@gmail.com', 123, "Foo"],
        ['test@gmail.com', 456, "Bar"],
        ['example@gmail.com', 789, "Baz"],
        ['example@gmail.com', 123, "Foo"]],
    columns=['email', 'node_id', 'title'])

df.groupby('email').agg({'node_id': list, 'title': list})

最后，它仍然有一个node_id和title列，但是node_id和title被收集了。

                     node_id     title
email       
example@gmail.com   [789, 123]  [Baz, Foo]
test@gmail.com      [123, 456]  [Foo, Bar]

根据某些行/条件将表格式化为多列

2 个答案: