csv中的此数据框:
id name A B C gpa
0 1111 Phineas NaN B NaN 3.0
1 1113 Tilly NaN NaN C 2.5
2 1110 Andres A NaN NaN 3.8
3 1112 Jax NaN B NaN 3.2
4 1114 Ray NaN B NaN 3.1
5 1115 Koda NaN NaN C 2.4
6 1120 Bruno A NaN NaN 3.7
7 1134 Davis NaN NaN C 2.6
8 1102 Cassie A NaN NaN 4.0
我想要输出:
id name grade gpa
0 1111 Phineas B 3.0
1 1113 Tilly C 2.5
2 1110 Andres A 3.8
3 1112 Jax C 3.2
4 1114 Ray B 3.1
5 1115 Koda C 2.4
6 1120 Bruno A 3.7
7 1134 Davis C 2.6
8 1102 Cassie A 4.0
这是什么代码?
答案 0 :(得分:2)
将combine_first
与drop
结合使用,在这种情况下,您不需要SELECT
ts.[Resource NUID]
, ts.[Timesheet ID]
, ts.[Timesheet Start Date]
, ts.[Timesheet End Date]
, ts.[Timesheet Posted Date]
, ts.[Timesheet Status]
, ts.[RunSourceID]
, ts.[SpanStartDate]
, ts.[SpanEndDate]
FROM
[TIME_DW].[dbo].[Timecard_Timesheets] as ts
JOIN (
SELECT *
, CASE
WHEN TimeSheetCount > 1 AND [Timesheet Status] <> 'Posted' THEN 'Adjusted'
WHEN TimeSheetCount = 1 AND [Timesheet Status] <> 'Posted' THEN 'Error'
ELSE 'Posted'
END NewStatus
FROM (
SELECT *
, COUNT(*) OVER(PARTITION BY t1.[Timesheet ID]) TimeSheetCount
, ROW_NUMBER() OVER(ORDER BY t1.[Timesheet Start Date]) RN
FROM
[TIME_DW].[dbo].[Timecard_Timesheets] as t1
) D
) t2 ON ts.[Timesheet ID] = t2.[Timesheet ID]
WHERE
ts.[Resource NUID] = 'e066308'
AND ts.[Timesheet Status] <> 'Open'
AND ts.[Timesheet Status] <> 'Submitted'
:
melt
或者:
df['grade'] = df['A'].combine_first(df['B']).combine_first(df['C'])
df.drop(['A','B','C'], axis=1, inplace=True)
df['grade'] = df[['A','B','C']].values[df[['A','B','C']].notnull()]
df.drop(['A','B','C'], axis=1, inplace=True)
答案 1 :(得分:1)
如果您不愿意使用融化,则此解决方案可能对您有用:由于每个学生都单独拥有A,B或C,因此您可以首先将所有NaN
值列转换为空字符串,然后使用+
运算符将A,B和C列连接在一起
导入语句并启动DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame({'id':[1111,1113],
'name':['Phineas','Tilly'],
'A':[np.NaN,np.NaN],
'B':['B',np.NaN],
'C':[np.NaN,'C'],
'gpa':[3.0,2.5]
})
# id name A B C gpa
# 0 1111 Phineas NaN B NaN 3.0
# 1 1113 Tilly NaN NaN C 2.5
按列的字符串连接和输出:
df.fillna('',inplace=True) #replaces all NaN's with ""-empty strings
df['letter_grades'] = df['A'] + df['B'] + df['C'] #concatenate
df = df[['id','name','letter_grades','gpa']] #reassign dataframe identifier
print(df)
# id name letter_grades gpa
#0 1111 Phineas B 3.0
#1 1113 Tilly C 2.5