我有一个CSV,其数据如下所示:
Check1,Check2,Check3,Subscription Number,Customer
X,"","",100,Target
"",X,X,101,Walmart
X,"","",102,Walgreens
X,"",X,103,RiteAid
"",X,"",104,Seven Eleven
"","",X,105,Walgreens
我希望能够在所有划线的CheckN列上报告订阅号和客户字段。
所以我的预期输出是一个数据框或一个字典,列出了前三列中所有X标记的[订阅号]和[客户]列。
Check1: (100,"Target"),(102,"Walgreens"),(103,"RiteAid")
Check2: (101,"Walmart"),(104,"Seven Eleven")
Check3: (101,"Walmart"),(103,"RiteAid"),(105,"Walgreens")
答案 0 :(得分:0)
将索引设置为最后两列并调用stack
。
v = df.set_index(['Subscription Number', 'Customer']).stack()
接下来,在索引的最后一列上执行groupby
,调用apply
+ to_dict
并转换为字典。
v.groupby(level=-1).apply(lambda x: x.index.droplevel(-1).tolist()).to_dict()
{
'Check1': [
(100, 'Target'),
(102, 'Walgreens'),
(103, 'RiteAid')
],
'Check2': [
(101, 'Walmart'),
(104, 'Seven Eleven')
],
'Check3': [
(101, 'Walmart'),
(103, 'RiteAid'),
(105, 'Walgreens')
]
}