我想使用pandas.Dataframe
功能将pandas
换位成表格格式
因此,所有电话号码都应在MSISD
列下提及,并且play_id
的phone1或phone2等应具有列名的值。
df是
df = pd.DataFrame({
'id': ['1', '2', '3'],
'play_id': ['20002075', '601731', '601731'],
'phone1': ['0900031349', '', ''],
'phone2': ['090891349', '', ''],
'phone3': ['', '', ''],
'phone4': ['', '', ''],
'phone5': ['', '088235311', ''],
'phone6': ['', '', ''],
'phone7': ['', '', '088235311']
})
预期输出应为
id play_id msisd
1: 1 phone1 0900031349
2: 2 phone2 090891349
答案 0 :(得分:2)
使用DataFrame.melt
来删除boolean indexing
带有空字符串的值:
df1 = df.melt(['id','play_id'], value_name='val', var_name='phone')
df1 = df1[df1['val'] != '']
#if empty strings are NANs
#df1 = df1[df1['val'].notna()]
print (df1)
id play_id phone val
0 1 20002075 phone1 0900031349
3 1 20002075 phone2 090891349
13 2 601731 phone5 088235311
20 3 601731 phone7 088235311
或使用DataFrame.stack
并将空字符串替换为缺失值:
df1 = (df.set_index(['id','play_id'])
.replace('', np.nan)
.stack()
.reset_index(name='val')
.rename(columns={'level_2':'phone'})
)
print (df1)
id play_id phone val
0 1 20002075 phone1 0900031349
1 1 20002075 phone2 090891349
2 2 601731 phone5 088235311
3 3 601731 phone7 088235311