将熊猫数据框转换为转置表格格​​式

时间:2019-05-10 05:30:21

标签: python pandas

我想使用pandas.Dataframe功能将pandas换位成表格格式 因此,所有电话号码都应在MSISD列下提及,并且play_id的phone1或phone2等应具有列名的值。

df是

df = pd.DataFrame({
    'id': ['1', '2', '3'],
    'play_id': ['20002075', '601731', '601731'],
    'phone1': ['0900031349', '', ''],
    'phone2': ['090891349', '', ''],
    'phone3': ['', '', ''],
    'phone4': ['', '', ''],
    'phone5': ['', '088235311', ''],
    'phone6': ['', '', ''],
    'phone7': ['', '', '088235311']
})

预期输出应为

   id           play_id  msisd
1:  1            phone1  0900031349
2:  2            phone2  090891349

1 个答案:

答案 0 :(得分:2)

使用DataFrame.melt来删除boolean indexing带有空字符串的值:

df1 = df.melt(['id','play_id'], value_name='val', var_name='phone')
df1 = df1[df1['val'] != '']
#if empty strings are NANs 
#df1 = df1[df1['val'].notna()]
print (df1)
   id   play_id   phone         val
0   1  20002075  phone1  0900031349
3   1  20002075  phone2   090891349
13  2    601731  phone5   088235311
20  3    601731  phone7   088235311

或使用DataFrame.stack并将空字符串替换为缺失值:

df1 = (df.set_index(['id','play_id'])
        .replace('', np.nan)
        .stack()
        .reset_index(name='val')
        .rename(columns={'level_2':'phone'})
        )

print (df1)
  id   play_id   phone         val
0  1  20002075  phone1  0900031349
1  1  20002075  phone2   090891349
2  2    601731  phone5   088235311
3  3    601731  phone7   088235311