下面是我的熊猫数据框
Id IsDef Data
1 Y 1a
2 N,N,N,Y 2a,2b,2c,2d
3 N,Y 3a,3b
如何使用Pandas如下拆分它?仅对“是”和“否”的前两个条目进行优先级排序
Id DataY_1 DataY_2 DataN_1 DataN_2
1 1a NULL NULL NULL
2 2d NULL 2a 2b
3 3b NULL 3a NULL
答案 0 :(得分:2)
您可以将列展平为DataFrame
:
from itertools import chain
d = df['Data'].str.split(',')
isdef = df['IsDef'].str.split(',')
df = pd.DataFrame({
'Data' : list(chain.from_iterable(d)),
'IsDef' : list(chain.from_iterable(isdef)),
'Id' : df['Id'].repeat(d.str.len())
})
然后将cumcount
用于每个组的计数器,并用boolean indexing
过滤掉所有行,而不用前2个:
N = 2
df['g'] = df.groupby(['Id','IsDef']).cumcount()
df = df[df['g'] < N]
然后通过set_index
和unstack
重塑形状,并通过reindex
添加缺失的类别。最后将MultiIndex
的列f-string
展平:
mux = pd.MultiIndex.from_product([['Y','N'], np.arange(N)])
df = df.set_index(['Id','IsDef', 'g'])['Data'].unstack([1,2]).reindex(columns=mux)
df.columns = [f'Data{i}_{j+1}' for i, j in df.columns]
print (df)
DataY_1 DataY_2 DataN_1 DataN_2
Id
1 1a NaN NaN NaN
2 2d NaN 2a 2b
3 3b NaN 3a NaN