我有两列,例如:
row1 row2
0 500
1400 -1
1330 -1
0 900
500 -1
在这里,如果row1的值为0,则row2的值不为-1。如果row2的值为-1,则row1的值不为0。
我想像这样新建一行:
row3
500
1400
1330
900
500
在此行中,如果row1的值为0,则它将其值替换为row2。我该怎么做?
答案 0 :(得分:3)
您可以使用numpy.where
(我希望将其命名为numpy.if_then_else
)。
>>> df['row3'] = np.where(df['row2'] == -1, df['row1'], df['row2'])
>>> df
row1 row2 row3
0 0 500 500
1 1400 -1 1400
2 1330 -1 1330
3 0 900 900
4 500 -1 500
或者,更简洁一些,但非常具体地针对您问题中的设置:
>>> df['row3'] = np.where(df['row1'], df['row1'], df['row2'])
>>> df
row1 row2 row3
0 0 500 500
1 1400 -1 1400
2 1330 -1 1330
3 0 900 900
4 500 -1 500
时间:
>>> df = pd.concat([df]*1000)
>>> df_c = df.copy()
>>> %timeit df.clip_lower(0).sum(1) # coldspeed 1
537 µs ± 5.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df.row2.mask(df.row2.eq(-1)).combine_first(df.row1) # coldspeed 2
964 µs ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df_c.loc[df_c.row2 == -1, 'row2'] = np.nan; df_c.row2.add(df_c.row1, fill_value=0) # coldspeed 3
2.66 ms ± 24.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit [r1 if r2 == -1 else r2 for r1, r2 in zip(df.row1, df.row2)] # Daniel Mesejo
466 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df.replace(-1,0).sum(1) # W-B
783 µs ± 45.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit np.where(df['row2'] == -1, df['row1'], df['row2']) # timgeb 1
173 µs ± 4.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit np.where(df['row1'], df['row1'], df['row2']) # timgeb 2
38.1 µs ± 3.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
答案 1 :(得分:2)
clip_lower
+ sum
假设您的DataFrame没有负值,...
df['row3'] = df.clip_lower(0).sum(1)
df
row1 row2 row3
0 0 500.0 500.0
1 1400 NaN 1400.0
2 1330 NaN 1330.0
3 0 900.0 900.0
4 500 NaN 500.0
mask
+ combine_first
df.row2.mask(df.row2.eq(-1)).combine_first(df.row1)
0 500.0
1 1400.0
2 1330.0
3 900.0
4 500.0
Name: row2, dtype: float64
Series.add
df.loc[df.row2 == -1, 'row2'] = np.nan
df.row2.add(df.row1, fill_value=0)
# Or,
# df.row2.mask(df.row2.eq(-1)).add(df.row1, fill_value=0)
0 500.0
1 1400.0
2 1330.0
3 900.0
4 500.0
dtype: float64
答案 2 :(得分:1)
一个简单的list comprehension可以做到:
import pandas as pd
data = [[0, 500],
[1400, -1],
[1330, -1],
[0, 900],
[500, -1]]
df = pd.DataFrame(data=data, columns=["row1", "row2"])
df["row3"] = [r1 if r2 == -1 else r2 for r1, r2 in zip(df.row1, df.row2)]
print(df)
输出
row1 row2 row3
0 0 500 500
1 1400 -1 1400
2 1330 -1 1330
3 0 900 900
4 500 -1 500
答案 3 :(得分:1)
我的5美分
df.replace(-1,0).sum(1)
Out[338]:
0 500
1 1400
2 1330
3 900
4 500
dtype: int64
答案 4 :(得分:0)
您可以使用pandas
'函数loc
:
df['row3'] = df.row1
df.loc[df.row3 == 0, 'row3'] = df.row2
获取:
| | row1 | row2 | row3 |
|---|------|------|------|
| 0 | 0 | 500 | 500 |
| 1 | 1400 | -1 | 1400 |
| 2 | 1330 | -1 | 1400 |
| 3 | 0 | 900 | 900 |
| 4 | 500 | -1 | 500 |