转移后如何将NaN抛在后面

时间:2018-11-16 22:22:36

标签: python pandas

我有一个函数,如果该列(Col_6)为空白,则将一列(Col_5)的值移动到另一列(Col_6),如下所示:

def shift(row):
    return row['Col_6'] if not pd.isnull(row['Col_6']) else row['Col_5']

然后我将此函数应用于我的列,如下所示:

df[['Col_6', 'Col_5']].apply(shift, axis=1)

这可以很好地工作,但是我不需要将原始值保留在Col_5中,而是需要将其转移到Col_6并在其位置保留np.nan(因此,我可以将相同的功能应用于前一列。)有想法吗?

3 个答案:

答案 0 :(得分:2)

fillna + mask:矢量化,而不是逐行

对于Pandas,您应该尝试通过apply 避免逐行操作,因为这些操作是通过Python级循环处理的。在这种情况下,您可以使用:

null_mask = df['Col_6'].isnull()
df['Col_6'] = df['Col_6'].fillna(df['Col_5'])
df['Col_5'] = df['Col_5'].mask(null_mask)

请注意,我们首先计算并存储一个布尔序列,该序列表示Col_6为null的位置首先,然后在以后通过它使这些值变为null的情况下,这些值已通过fillna

答案 1 :(得分:1)

import pandas as pd
import numpy as np
df = pd.DataFrame({'Col_5':[1, np.nan, 3, 4, np.nan],
                   'Col_6':[np.nan, 8, np.nan, 6, np.nan]})
col_5 = df['Col_5'].copy()
df.loc[pd.isnull(df['Col_6']), 'Col_5'] = np.nan
df.loc[pd.isnull(df['Col_6']), 'Col_6'] = col_5

输出:

# Original Dataframe:
   Col_5  Col_6
0    1.0    NaN
1    NaN    8.0
2    3.0    NaN
3    4.0    6.0
4    NaN    NaN
# Fill Col_5 with NaN where Col_6 is NaN:
   Col_5  Col_6
0    NaN    NaN
1    NaN    8.0
2    NaN    NaN
3    4.0    6.0
4    NaN    NaN
# Assign the original col_5 values to Col_6:
   Col_5  Col_6
0    NaN    1.0
1    NaN    8.0
2    NaN    3.0
3    4.0    6.0
4    NaN    NaN

答案 2 :(得分:0)

设置 (使用@cosmic_inquiry中的设置)

df = pd.DataFrame({'Col_5':[1, np.nan, 3, 4, np.nan],
                   'Col_6':[np.nan, 8, np.nan, 6, np.nan]})

您可以像使用mask

的基本交换操作那样看待这个问题

numpy.flip + numpy.isnan

a = df[['Col_5', 'Col_6']].values
m = np.isnan(a[:, 1])
a[m] = np.flip(a[m], axis=1)
df[['Col_5', 'Col_6']] = a

np.isnan + loc

m = np.isnan(df['Col_6'])
df.loc[m, ['Col_5', 'Col_6']] = df.loc[m, ['Col_6', 'Col_5']].values

   Col_5  Col_6
0    NaN    1.0
1    NaN    8.0
2    NaN    3.0
3    4.0    6.0
4    NaN    NaN

性能

test_df = \
    pd.DataFrame(np.random.choice([1, np.nan], (1_000_000, 2)), columns=['Col_5', 'Col_6'])

In [167]: %timeit chris(test_df)
68.3 ms ± 291 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [191]: %timeit chris2(test_df)
43.9 ms ± 296 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [168]: %timeit jpp(test_df)
86.7 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [169]: %timeit cosmic(test_df)
130 ms ± 1.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)