我正在寻找一种将多列折叠为新的单列的方法。给定以下数据,我想创建一个名为“ Day”的新列,并在可用时填充它们的day值。如果没有一天,我想返回None的值。你能帮我做到这一点吗?
df = pd.DataFrame({'Monday': {0: 'Monday', 1: 'None', 2: 'None'},
'Tuesday': {0: 'None', 1: 'None', 2: 'Tuesday'},
'Wednesday': {0: 'None', 1: 'None', 2: 'None'}})
DataFrame
Monday Tuesday Wednesday
0 Monday None None
1 None None None
2 None Tuesday None
具有所需输出的新列:
Day
0 Monday
1 None
2 Tuesday
我尝试使用melt,但未完全满足我的要求,并且为要折叠的每一列创建了额外的行。
我的尝试
df = pd.melt(df, var_name='Day')
Day value
0 Monday Monday
1 Monday None
2 Monday None
3 Monday None
4 Tuesday None
5 Tuesday None
6 Tuesday Tuesday
7 Tuesday None
8 Wednesday None
9 Wednesday None
10 Wednesday None
11 Wednesday None
答案 0 :(得分:3)
如果仅需要每行第一个非缺失值,则首先将字符串None
替换为缺失值NaN
,然后回填缺失值并按位置选择第一列:
df = df.replace('None', np.nan).bfill(axis=1).iloc[:, 0]
print (df)
0 Monday
1 NaN
2 Tuesday
3 NaN
Name: Monday, dtype: object
详细信息:
print (df.replace('None', np.nan))
Monday Tuesday Wednesday
0 Monday NaN NaN
1 NaN NaN NaN
2 NaN Tuesday NaN
3 NaN NaN NaN
print (df.replace('None', np.nan).bfill(axis=1))
Monday Tuesday Wednesday
0 Monday NaN NaN
1 NaN NaN NaN
2 Tuesday Tuesday NaN
3 NaN NaN NaN
print (df.replace('None', np.nan).bfill(axis=1).iloc[:, 0])
0 Monday
1 NaN
2 Tuesday
3 NaN
Name: Monday, dtype: object
答案 1 :(得分:2)
max函数可以在这里为您提供帮助,但是您需要像这样将“ None”文本暂时替换为“ 0”。
df['newcolumn'] = df.replace('None', '0').max(axis=1).replace('0', 'None')
答案 2 :(得分:0)
一种不太优雅的处理方式: 使用逻辑(或比较)遍历行,以查找每行的日期。附加到列表,然后添加到数据框。
# Initialize empty list
Days = []
for idx, row in df.iterrows():
# assume there is no day
day = None
for col in ['Monday','Tuesday','Wednesday']:
# if there is a value, set value to day
if str(row[col])!='None':
day = row[col]
# append to list
Days.append(day)
# Add list to df
df['Day'] = Days
# Drop unused cols
df.drop(columns = ['Monday','Tuesday','Wednesday'], inplace = True)
print(df)
Day
0 Monday
1 None
2 Tuesday