我有一个像这样的Pandas DataFrame:
A:seq = 100
B:ack = 101,seq = 200
B:ack = 101,seq = 200 or 201?
A:seq = 101,ack = 201 or 202?
如何生成这样的新DataFrame?
id Apple Apricot Banana Climentine Orange Pear Pineapple
01 1 1 0 0 0 0 0
02 0 0 1 1 1 1 0
03 0 0 0 0 1 0 1
答案 0 :(得分:4)
使用melt
,过滤1
,并使用,
为每个群组添加最后一次值:
df = pd.DataFrame({
'id': ['01','02','03'],
'Apple': [1,0,0],
'Apricot': [1,0,0],
'Banana': [0,1,0],
'Climentine': [0,1,0],
'Orange': [0,1,1],
'Pear': [0,1,0],
'Pineapple': [0,0,1]
})
df = (df.melt('id', var_name='fruits').query('value == 1')
.groupby('id')['fruits']
.apply(', '.join)
.reset_index())
print (df)
# id fruits
#0 1 Apple, Apricot
#1 2 Banana, Climentine, Orange, Pear
#2 3 Orange, Pineapple
为了获得更好的性能,请使用dot
进行矩阵乘法:
df = df.set_index('id')
df = df.dot(df.columns + ', ').str.rstrip(', ').reset_index(name='fruit')
print (df)
id fruit
0 01 Apple, Apricot
1 02 Banana, Climentine, Orange, Pear
2 03 Orange, Pineapple
答案 1 :(得分:2)
好的,我在这里搜索了一些,发现:https://stackoverflow.com/a/24045425/7386332
这里的等价物只是0
在理解中评估为False
。
import pandas as pd
df = pd.DataFrame({
'id': ['01','02','03'],
'Apple': [1,0,0],
'Apricot': [1,0,0],
'Banana': [0,1,0],
'Climentine': [0,1,0],
'Orange': [0,1,1],
'Pear': [0,1,0],
'Pineapple': [0,0,1]
})
df = (df.set_index('id')
.apply(lambda row: ', '.join([col for col, b in zip(df.columns, row) if b]),
axis=1)
.reset_index())
小集(上述)
1.37 ms ± 4.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # list comprehensio
1.41 ms ± 2.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # df.dot
3.28 ms ± 81.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # df.melt
使用(df = pd.concat([df]*1000)
)来模拟更大的集合:
36.9 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) # df.dot
39.8 ms ± 369 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) # df.melt
84.5 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) # list comprehension