我是Python 熊猫的新手。
我目前有一个excel数据,部分如下所示。您可以看到,每一行可能有许多Suppliers和Supplier PN。我需要保留P / N和Description列,并将其他列拆分为行。
Supplier PN Supplier.1 Supplier PN.1 Supplier.2 \
0 GRM1555C1H101JA01D YAGEO CC0402JRNPO9BN101 GRM1555C1H101JA01J
1 04025A6R8CAT2A KEMET C0402C689C5GACTU NaN
2 04025A3R9CAT2A NaN NaN NaN
Supplier PN.2
0 Murata Electronics North America
1 NaN
2 NaN
我期望的是:
P/N Description Supplier \
0 302-462-326 CAP CER 0402 100pF 5% 50V MURATA
1 302-462-326 CAP CER 0402 100pF 5% 50V YAGEO
2 302-462-326 CAP CER 0402 100pF 5% 50V GRM1555C1H101JA01J
3 302-462-012 CAP CER 0402 6.8pF 0.25pF 50V AVX Corporation
4 302-462-012 CAP CER 0402 6.8pF 0.25pF 50V KEMET
5 302-462-009 CAP CER 0402 3.9pF 0.25pF 50V AVX Corporation
Supplier PN
0 GRM1555C1H101JA01D
1 CC0402JRNPO9BN101
2 Murata Electronics North America
3 04025A6R8CAT2A
4 C0402C689C5GACTU
5 04025A3R9CAT2A
如何使用Python Pandas处理它?谢谢。
答案 0 :(得分:2)
这与pd.wide_to_long
略有相似。因此,您可以尝试以下代码:
# sample data
# replaced with df=pd.read_excel(...)
df = pd.DataFrame({'P/N':[1,2,3],
'Description':['a','b','c'],
'Supplier':['x','y','z'],
'Supplier PN':['xx','yy','zz'],
'Supplier.1':['X','Y',np.nan],
'Supplier PN.1':['XX','YY',np.nan]})
(df.melt(['P/N','Description'])
.dropna()
.assign(stub=lambda x: x.variable.str.extract('([^\.]*)\.?'),
idx=lambda x: x.groupby('stub').cumcount()
)
.pivot_table(index=['P/N','Description','idx'],
columns='stub',
values='value',
aggfunc='first')
.reset_index()
.drop('idx', axis=1)
)
或将此代码与wide_to_long
:
df.columns = np.where(df.columns.str.match('^Supp.*\D+$'),
df.columns + '.0',
df.columns)
(pd.wide_to_long(df, ['Supplier', 'Supplier PN'],
['P/N', 'Description'],
'num', sep='.')
.dropna()
.reset_index()
.drop('num',axis=1)
)
输出:
stub P/N Description Supplier Supplier PN
0 1 a x xx
1 1 a X XX
2 2 b y yy
3 2 b Y YY
4 3 c z zz