如何在Python熊猫中使用pd.melt

时间:2018-11-17 00:41:48

标签: python pandas dataframe

csv中的此数据框:

id  name    A   B   C   gpa
0   1111    Phineas NaN B   NaN 3.0
1   1113    Tilly   NaN NaN C   2.5
2   1110    Andres  A   NaN NaN 3.8
3   1112    Jax NaN B   NaN 3.2
4   1114    Ray NaN B   NaN 3.1
5   1115    Koda    NaN NaN C   2.4
6   1120    Bruno   A   NaN NaN 3.7
7   1134    Davis   NaN NaN C   2.6
8   1102    Cassie  A   NaN NaN 4.0

我想要输出:

id  name    grade   gpa
0   1111    Phineas B   3.0
1   1113    Tilly   C   2.5
2   1110    Andres  A   3.8
3   1112    Jax     C   3.2
4   1114    Ray     B   3.1
5   1115    Koda    C   2.4
6   1120    Bruno   A   3.7
7   1134    Davis   C   2.6
8   1102    Cassie  A   4.0

这是什么代码?

2 个答案:

答案 0 :(得分:2)

combine_firstdrop结合使用,在这种情况下,您不需要SELECT ts.[Resource NUID] , ts.[Timesheet ID] , ts.[Timesheet Start Date] , ts.[Timesheet End Date] , ts.[Timesheet Posted Date] , ts.[Timesheet Status] , ts.[RunSourceID] , ts.[SpanStartDate] , ts.[SpanEndDate] FROM [TIME_DW].[dbo].[Timecard_Timesheets] as ts JOIN ( SELECT * , CASE WHEN TimeSheetCount > 1 AND [Timesheet Status] <> 'Posted' THEN 'Adjusted' WHEN TimeSheetCount = 1 AND [Timesheet Status] <> 'Posted' THEN 'Error' ELSE 'Posted' END NewStatus FROM ( SELECT * , COUNT(*) OVER(PARTITION BY t1.[Timesheet ID]) TimeSheetCount , ROW_NUMBER() OVER(ORDER BY t1.[Timesheet Start Date]) RN FROM [TIME_DW].[dbo].[Timecard_Timesheets] as t1 ) D ) t2 ON ts.[Timesheet ID] = t2.[Timesheet ID] WHERE ts.[Resource NUID] = 'e066308' AND ts.[Timesheet Status] <> 'Open' AND ts.[Timesheet Status] <> 'Submitted'

melt

或者:

df['grade'] = df['A'].combine_first(df['B']).combine_first(df['C'])
df.drop(['A','B','C'], axis=1, inplace=True)

df['grade'] = df[['A','B','C']].values[df[['A','B','C']].notnull()]
df.drop(['A','B','C'], axis=1, inplace=True)

答案 1 :(得分:1)

如果您不愿意使用融化,则此解决方案可能对您有用:由于每个学生都单独拥有A,B或C,因此您可以首先将所有NaN值列转换为空字符串,然后使用+运算符将A,B和C列连接在一起

导入语句并启动DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame({'id':[1111,1113],
'name':['Phineas','Tilly'],
'A':[np.NaN,np.NaN],
'B':['B',np.NaN],
'C':[np.NaN,'C'],
'gpa':[3.0,2.5]
})
#     id      name    A   B   C   gpa
# 0   1111    Phineas NaN B   NaN 3.0
# 1   1113    Tilly   NaN NaN C   2.5

按列的字符串连接和输出:

df.fillna('',inplace=True) #replaces all NaN's with ""-empty strings
df['letter_grades'] = df['A'] + df['B'] + df['C'] #concatenate
df = df[['id','name','letter_grades','gpa']] #reassign dataframe identifier
print(df)

#     id     name letter_grades  gpa
#0  1111  Phineas             B  3.0
#1  1113    Tilly             C  2.5