在数据帧上使用具有多个条件的np.where

时间:2018-01-10 15:37:36

标签: python python-3.x pandas numpy dataframe

我有一个名为volume的Dataframe,其中包含数千个井的日期和数字。

| WellName | Date     | Oil | Water | Inject |BeforeDate| Before | After | AfterDate
|----------|----------|-----|-------|--------|--------- |--------|-------|----------
| Well_1   | 1/1/2000 | 10  | 10    |        | 1/1/2001 | Prod   |  Inj  | 1/1/2002
| Well_1   | 1/1/2001 | 10  | 20    |        | 1/1/2001 | Prod   |  Inj  | 1/1/2002
| Well_1   | 1/1/2002 | 50  | 60    |        | 1/1/2001 | Prod   |  Inj  | 1/1/2002
| Well_2   | 1/1/2000 |     |       | 700    | 1/1/2001 | Inj    |  Prod | 1/1/2002
| Well_2   | 1/1/2001 |     |       | 720    | 1/1/2001 | Inj    |  Prod | 1/1/2002
| Well_2   | 1/1/2002 |     |       | 800    | 1/1/2001 | Inj    |  Prod | 1/1/2002
| Well_3   | 1/1/2000 |     |       | 1000   | 1/1/2001 | Inj    |  Inj  | 1/1/2002
| Well_3   | 1/1/2001 |     |       | 1500   | 1/1/2001 | Inj    |  Inj  | 1/1/2002
| Well_3   | 1/1/2002 |     |       | 2000   | 1/1/2001 | Inj    |  Inj  | 1/1/2002

对于日期为< = BeforeDate&的井,我需要按年汇总油+水柱。在==' Prod'之前,我想总结Inject列,其中Date< = BeforeDate&之前==' Inj'。

如何包含Else Date< = BeforeDate&之前==' Inj' ?

这是我到目前为止所得到的并且意识到这是不正确的。

volumes['totals_before'] = np.where((volumes['Before'] == 'Prod') & (volumes['Date'] <= volumes['BeforeDate']), volumes['Oil'] + volumes['Water'], volumes['Inject'])

一旦卷[&#39; totals_before&#39;]正确计算,我需要转发填充(ffill)最近的总和(在这种情况下为1/1/2001)并将其添加到另一列,卷[&#39; totals_after&#39;],即Date&gt; = AfterDate。

最终结果如下所示:

volumes['new_Tots'] = volumes['totals_before'] + volumes['totals_after'] 

预期输出:

| WellName |   Date   | totals_before | totals_after | new_Tots |
|----------|----------|---------------|--------------|----------|
| Well_1   | 1/1/2000 |      20       |              |   20     |
| Well_1   | 1/1/2001 |      30       |              |   30     |
| Well_1   | 1/1/2002 |  30(ffill)    |     110      |   140    |
| Well_2   | 1/1/2000 |      700      |              |   700    |
| Well_2   | 1/1/2001 |      720      |              |   720    |
| Well_2   | 1/1/2002 |  720(ffill)   |     800      |   1520   |
| Well_3   | 1/1/2000 |      1000     |              |   1000   |
| Well_3   | 1/1/2001 |      1500     |              |   1500   |
| Well_3   | 1/1/2002 |  1500(ffill)  |     2000     |   3500   |

2 个答案:

答案 0 :(得分:0)

认为这应该有用。

Prod_part = volumes.where(volumes.Date <= volumes.BeforeDate)\
                   .where(volumes.Before == "Prod")[["Water", "Oil"]].sum(
                          axis=1, min_count=1)
Inj_part = volumes.where(volumes.Date <= volumes.BeforeDate).where(volumes.Before == "Inj")["Inject"]

volumes["totals_before"] = Inj_part.combine_first(Prod_part)
volumes.totals_before.ffill(inplace=True)


0      20.0
1      30.0
2      30.0
3     700.0
4     720.0
5     720.0
6    1000.0
7    1500.0
8    1500.0

再次,使用to_dict功能为您的数据框提供保存生命的功能。

答案 1 :(得分:0)

这有点冗长,但可以作为你想要实现的目标的一个很好的草案。它假定可以比较日期(因此它们存储为datetime而不是字符串)。

condition = volumes['Date'] <= volumes['BeforeDate']

# Before
volumes.loc[(condition) & (volumes['Before'] == 'Prod'),
            'totals_before'] = volumes['Oil'] + volumes['Water']
volumes.loc[(condition) & (volumes['Before'] == 'Inj'),
            'totals_before'] = volumes['Inject']

# After
volumes.loc[(~condition) & (volumes['Before'] == 'Prod'),
            'totals_after'] = volumes['Oil'] + volumes['Water']
volumes.loc[(~condition) & (volumes['Before'] == 'Inj'),
            'totals_after'] = volumes['Inject']

volumes = volumes.sort_values(by=['WellName', 'Date'])
volumes['totals_before'] = volumes['totals_before'].fillna(method='ffill')

volumes['new_Tots'] = volumes['totals_before'] + volumes['totals_after'].fillna(0)

哪个输出:

In[3]: volumes[['WellName', 'Date', 'totals_before', 'totals_after', 'new_Tots']]
Out[3]: 
     WellName       Date  totals_before  totals_after  new_Tots
0   Well_1    2000-01-01           20.0           NaN      20.0
1   Well_1    2001-01-01           30.0           NaN      30.0
2   Well_1    2002-01-01           30.0         110.0     140.0
3   Well_2    2000-01-01          700.0           NaN     700.0
4   Well_2    2001-01-01          720.0           NaN     720.0
5   Well_2    2002-01-01          720.0         800.0    1520.0
6   Well_3    2000-01-01         1000.0           NaN    1000.0
7   Well_3    2001-01-01         1500.0           NaN    1500.0
8   Well_3    2002-01-01         1500.0        2000.0    3500.0

如果以下假设是正确的,这可以大大简化:当填充Inject时,油和水总是空的。反之亦然。