如果在Pandas Python中单元格值为-1,如何放置另一列的值

时间:2018-12-17 17:59:27

标签: python pandas dataframe

我有两列,例如:

row1   row2
0      500
1400   -1
1330   -1
0      900
500    -1

在这里,如果row1的值为0,则row2的值不为-1。如果row2的值为-1,则row1的值不为0。

我想像这样新建一行:

row3
500 
1400
1330
900
500

在此行中,如果row1的值为0,则它​​将其值替换为row2。我该怎么做?

5 个答案:

答案 0 :(得分:3)

您可以使用numpy.where(我希望将其命名为numpy.if_then_else)。

>>> df['row3'] = np.where(df['row2'] == -1, df['row1'], df['row2'])                                                    
>>> df                                                                                                                 
   row1  row2  row3
0     0   500   500
1  1400    -1  1400
2  1330    -1  1330
3     0   900   900
4   500    -1   500

或者,更简洁一些,但非常具体地针对您问题中的设置:

>>> df['row3'] = np.where(df['row1'], df['row1'], df['row2'])                                                          
>>> df                                                                                                                 
   row1  row2  row3
0     0   500   500
1  1400    -1  1400
2  1330    -1  1330
3     0   900   900
4   500    -1   500

时间:

>>> df = pd.concat([df]*1000)
>>> df_c = df.copy()                                                                                          
>>> %timeit df.clip_lower(0).sum(1) # coldspeed 1                                                                      
537 µs ± 5.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df.row2.mask(df.row2.eq(-1)).combine_first(df.row1) # coldspeed 2                                          
964 µs ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df_c.loc[df_c.row2 == -1, 'row2'] = np.nan; df_c.row2.add(df_c.row1, fill_value=0) # coldspeed 3                   
2.66 ms ± 24.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit [r1 if r2 == -1 else r2 for r1, r2 in zip(df.row1, df.row2)] # Daniel Mesejo                               
466 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df.replace(-1,0).sum(1) # W-B                                                                              
783 µs ± 45.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)    
>>> %timeit np.where(df['row2'] == -1, df['row1'], df['row2']) # timgeb 1                                              
173 µs ± 4.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)    
>>> %timeit np.where(df['row1'], df['row1'], df['row2']) # timgeb 2                                                    
38.1 µs ± 3.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

答案 1 :(得分:2)

clip_lower + sum

假设您的DataFrame没有负值,...

df['row3'] = df.clip_lower(0).sum(1)    
df
   row1   row2    row3
0     0  500.0   500.0
1  1400    NaN  1400.0
2  1330    NaN  1330.0
3     0  900.0   900.0
4   500    NaN   500.0

mask + combine_first

df.row2.mask(df.row2.eq(-1)).combine_first(df.row1)

0     500.0
1    1400.0
2    1330.0
3     900.0
4     500.0
Name: row2, dtype: float64

屏蔽+ Series.add

df.loc[df.row2 == -1, 'row2'] = np.nan
df.row2.add(df.row1, fill_value=0)
# Or,
# df.row2.mask(df.row2.eq(-1)).add(df.row1, fill_value=0)

0     500.0
1    1400.0
2    1330.0
3     900.0
4     500.0
dtype: float64

答案 2 :(得分:1)

一个简单的list comprehension可以做到:

import pandas as pd

data = [[0, 500],
        [1400, -1],
        [1330, -1],
        [0, 900],
        [500, -1]]


df = pd.DataFrame(data=data, columns=["row1", "row2"])
df["row3"] = [r1 if r2 == -1 else r2 for r1, r2 in zip(df.row1, df.row2)]

print(df)

输出

   row1  row2  row3
0     0   500   500
1  1400    -1  1400
2  1330    -1  1330
3     0   900   900
4   500    -1   500

答案 3 :(得分:1)

我的5美分

df.replace(-1,0).sum(1)
Out[338]: 
0     500
1    1400
2    1330
3     900
4     500
dtype: int64

答案 4 :(得分:0)

您可以使用pandas'函数loc

df['row3'] = df.row1
df.loc[df.row3 == 0, 'row3'] = df.row2

获取:

|   | row1 | row2 | row3 |
|---|------|------|------|
| 0 | 0    | 500  | 500  |
| 1 | 1400 | -1   | 1400 |
| 2 | 1330 | -1   | 1400 |
| 3 | 0    | 900  | 900  |
| 4 | 500  | -1   | 500  |