熊猫将列折叠为单个列

时间:2019-05-19 13:21:55

标签: python python-3.x pandas

我正在寻找一种将多列折叠为新的单列的方法。给定以下数据,我想创建一个名为“ Day”的新列,并在可用时填充它们的day值。如果没有一天,我想返回None的值。你能帮我做到这一点吗?

df = pd.DataFrame({'Monday': {0: 'Monday', 1: 'None', 2: 'None'},
                   'Tuesday': {0: 'None', 1: 'None', 2: 'Tuesday'},
                   'Wednesday': {0: 'None', 1: 'None', 2: 'None'}})

DataFrame

   Monday  Tuesday Wednesday
0  Monday     None      None
1    None     None      None
2    None  Tuesday      None

具有所需输出的新列:

        Day
0    Monday
1      None
2   Tuesday

我尝试使用melt,但未完全满足我的要求,并且为要折叠的每一列创建了额外的行。

我的尝试

df = pd.melt(df, var_name='Day')

          Day    value
0      Monday   Monday
1      Monday     None
2      Monday     None
3      Monday     None
4     Tuesday     None
5     Tuesday     None
6     Tuesday  Tuesday
7     Tuesday     None
8   Wednesday     None
9   Wednesday     None
10  Wednesday     None
11  Wednesday     None

3 个答案:

答案 0 :(得分:3)

如果仅需要每行第一个非缺失值,则首先将字符串None替换为缺失值NaN,然后回填缺失值并按位置选择第一列:

df = df.replace('None', np.nan).bfill(axis=1).iloc[:, 0]
print (df)
0     Monday
1        NaN
2    Tuesday
3        NaN
Name: Monday, dtype: object

详细信息

print (df.replace('None', np.nan))
   Monday  Tuesday  Wednesday
0  Monday      NaN        NaN
1     NaN      NaN        NaN
2     NaN  Tuesday        NaN
3     NaN      NaN        NaN

print (df.replace('None', np.nan).bfill(axis=1))
    Monday  Tuesday Wednesday
0   Monday      NaN       NaN
1      NaN      NaN       NaN
2  Tuesday  Tuesday       NaN
3      NaN      NaN       NaN

print (df.replace('None', np.nan).bfill(axis=1).iloc[:, 0])
0     Monday
1        NaN
2    Tuesday
3        NaN
Name: Monday, dtype: object

答案 1 :(得分:2)

max函数可以在这里为您提供帮助,但是您需要像这样将“ None”文本暂时替换为“ 0”。

df['newcolumn'] = df.replace('None', '0').max(axis=1).replace('0', 'None')

答案 2 :(得分:0)

一种不太优雅的处理方式: 使用逻辑(或比较)遍历行,以查找每行的日期。附加到列表,然后添加到数据框。

# Initialize empty list
Days = []
for idx, row in df.iterrows():
  # assume there is no day
  day = None
  for col in ['Monday','Tuesday','Wednesday']:
    # if there is a value, set value to day
    if str(row[col])!='None':
      day = row[col]
  # append to list
  Days.append(day)

# Add list to df
df['Day'] = Days

# Drop unused cols

df.drop(columns = ['Monday','Tuesday','Wednesday'], inplace = True)
print(df)
       Day
0   Monday
1     None
2  Tuesday