我有一个函数,如果该列(Col_6)为空白,则将一列(Col_5)的值移动到另一列(Col_6),如下所示:
def shift(row):
return row['Col_6'] if not pd.isnull(row['Col_6']) else row['Col_5']
然后我将此函数应用于我的列,如下所示:
df[['Col_6', 'Col_5']].apply(shift, axis=1)
这可以很好地工作,但是我不需要将原始值保留在Col_5中,而是需要将其转移到Col_6并在其位置保留np.nan
(因此,我可以将相同的功能应用于前一列。)有想法吗?
答案 0 :(得分:2)
fillna
+ mask
:矢量化,而不是逐行对于Pandas,您应该尝试通过apply
避免逐行操作,因为这些操作是通过Python级循环处理的。在这种情况下,您可以使用:
null_mask = df['Col_6'].isnull()
df['Col_6'] = df['Col_6'].fillna(df['Col_5'])
df['Col_5'] = df['Col_5'].mask(null_mask)
请注意,我们首先计算并存储一个布尔序列,该序列表示Col_6
为null的位置首先,然后在以后通过它使这些值变为null的情况下,这些值已通过fillna
答案 1 :(得分:1)
import pandas as pd
import numpy as np
df = pd.DataFrame({'Col_5':[1, np.nan, 3, 4, np.nan],
'Col_6':[np.nan, 8, np.nan, 6, np.nan]})
col_5 = df['Col_5'].copy()
df.loc[pd.isnull(df['Col_6']), 'Col_5'] = np.nan
df.loc[pd.isnull(df['Col_6']), 'Col_6'] = col_5
输出:
# Original Dataframe:
Col_5 Col_6
0 1.0 NaN
1 NaN 8.0
2 3.0 NaN
3 4.0 6.0
4 NaN NaN
# Fill Col_5 with NaN where Col_6 is NaN:
Col_5 Col_6
0 NaN NaN
1 NaN 8.0
2 NaN NaN
3 4.0 6.0
4 NaN NaN
# Assign the original col_5 values to Col_6:
Col_5 Col_6
0 NaN 1.0
1 NaN 8.0
2 NaN 3.0
3 4.0 6.0
4 NaN NaN
答案 2 :(得分:0)
设置 (使用@cosmic_inquiry中的设置)
df = pd.DataFrame({'Col_5':[1, np.nan, 3, 4, np.nan],
'Col_6':[np.nan, 8, np.nan, 6, np.nan]})
您可以像使用mask
numpy.flip
+ numpy.isnan
a = df[['Col_5', 'Col_6']].values
m = np.isnan(a[:, 1])
a[m] = np.flip(a[m], axis=1)
df[['Col_5', 'Col_6']] = a
np.isnan
+ loc
:m = np.isnan(df['Col_6'])
df.loc[m, ['Col_5', 'Col_6']] = df.loc[m, ['Col_6', 'Col_5']].values
Col_5 Col_6
0 NaN 1.0
1 NaN 8.0
2 NaN 3.0
3 4.0 6.0
4 NaN NaN
性能
test_df = \
pd.DataFrame(np.random.choice([1, np.nan], (1_000_000, 2)), columns=['Col_5', 'Col_6'])
In [167]: %timeit chris(test_df)
68.3 ms ± 291 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [191]: %timeit chris2(test_df)
43.9 ms ± 296 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [168]: %timeit jpp(test_df)
86.7 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [169]: %timeit cosmic(test_df)
130 ms ± 1.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)