我有一个名为volume的Dataframe,其中包含数千个井的日期和数字。
| WellName | Date | Oil | Water | Inject |BeforeDate| Before | After | AfterDate
|----------|----------|-----|-------|--------|--------- |--------|-------|----------
| Well_1 | 1/1/2000 | 10 | 10 | | 1/1/2001 | Prod | Inj | 1/1/2002
| Well_1 | 1/1/2001 | 10 | 20 | | 1/1/2001 | Prod | Inj | 1/1/2002
| Well_1 | 1/1/2002 | 50 | 60 | | 1/1/2001 | Prod | Inj | 1/1/2002
| Well_2 | 1/1/2000 | | | 700 | 1/1/2001 | Inj | Prod | 1/1/2002
| Well_2 | 1/1/2001 | | | 720 | 1/1/2001 | Inj | Prod | 1/1/2002
| Well_2 | 1/1/2002 | | | 800 | 1/1/2001 | Inj | Prod | 1/1/2002
| Well_3 | 1/1/2000 | | | 1000 | 1/1/2001 | Inj | Inj | 1/1/2002
| Well_3 | 1/1/2001 | | | 1500 | 1/1/2001 | Inj | Inj | 1/1/2002
| Well_3 | 1/1/2002 | | | 2000 | 1/1/2001 | Inj | Inj | 1/1/2002
对于日期为< = BeforeDate&的井,我需要按年汇总油+水柱。在==' Prod'之前,我想总结Inject列,其中Date< = BeforeDate&之前==' Inj'。
如何包含Else Date< = BeforeDate&之前==' Inj' ?
这是我到目前为止所得到的并且意识到这是不正确的。
volumes['totals_before'] = np.where((volumes['Before'] == 'Prod') & (volumes['Date'] <= volumes['BeforeDate']), volumes['Oil'] + volumes['Water'], volumes['Inject'])
一旦卷[&#39; totals_before&#39;]正确计算,我需要转发填充(ffill)最近的总和(在这种情况下为1/1/2001)并将其添加到另一列,卷[&#39; totals_after&#39;],即Date&gt; = AfterDate。
最终结果如下所示:
volumes['new_Tots'] = volumes['totals_before'] + volumes['totals_after']
预期输出:
| WellName | Date | totals_before | totals_after | new_Tots |
|----------|----------|---------------|--------------|----------|
| Well_1 | 1/1/2000 | 20 | | 20 |
| Well_1 | 1/1/2001 | 30 | | 30 |
| Well_1 | 1/1/2002 | 30(ffill) | 110 | 140 |
| Well_2 | 1/1/2000 | 700 | | 700 |
| Well_2 | 1/1/2001 | 720 | | 720 |
| Well_2 | 1/1/2002 | 720(ffill) | 800 | 1520 |
| Well_3 | 1/1/2000 | 1000 | | 1000 |
| Well_3 | 1/1/2001 | 1500 | | 1500 |
| Well_3 | 1/1/2002 | 1500(ffill) | 2000 | 3500 |
答案 0 :(得分:0)
认为这应该有用。
Prod_part = volumes.where(volumes.Date <= volumes.BeforeDate)\
.where(volumes.Before == "Prod")[["Water", "Oil"]].sum(
axis=1, min_count=1)
Inj_part = volumes.where(volumes.Date <= volumes.BeforeDate).where(volumes.Before == "Inj")["Inject"]
volumes["totals_before"] = Inj_part.combine_first(Prod_part)
volumes.totals_before.ffill(inplace=True)
0 20.0
1 30.0
2 30.0
3 700.0
4 720.0
5 720.0
6 1000.0
7 1500.0
8 1500.0
再次,使用to_dict
功能为您的数据框提供保存生命的功能。
答案 1 :(得分:0)
这有点冗长,但可以作为你想要实现的目标的一个很好的草案。它假定可以比较日期(因此它们存储为datetime
而不是字符串)。
condition = volumes['Date'] <= volumes['BeforeDate']
# Before
volumes.loc[(condition) & (volumes['Before'] == 'Prod'),
'totals_before'] = volumes['Oil'] + volumes['Water']
volumes.loc[(condition) & (volumes['Before'] == 'Inj'),
'totals_before'] = volumes['Inject']
# After
volumes.loc[(~condition) & (volumes['Before'] == 'Prod'),
'totals_after'] = volumes['Oil'] + volumes['Water']
volumes.loc[(~condition) & (volumes['Before'] == 'Inj'),
'totals_after'] = volumes['Inject']
volumes = volumes.sort_values(by=['WellName', 'Date'])
volumes['totals_before'] = volumes['totals_before'].fillna(method='ffill')
volumes['new_Tots'] = volumes['totals_before'] + volumes['totals_after'].fillna(0)
哪个输出:
In[3]: volumes[['WellName', 'Date', 'totals_before', 'totals_after', 'new_Tots']]
Out[3]:
WellName Date totals_before totals_after new_Tots
0 Well_1 2000-01-01 20.0 NaN 20.0
1 Well_1 2001-01-01 30.0 NaN 30.0
2 Well_1 2002-01-01 30.0 110.0 140.0
3 Well_2 2000-01-01 700.0 NaN 700.0
4 Well_2 2001-01-01 720.0 NaN 720.0
5 Well_2 2002-01-01 720.0 800.0 1520.0
6 Well_3 2000-01-01 1000.0 NaN 1000.0
7 Well_3 2001-01-01 1500.0 NaN 1500.0
8 Well_3 2002-01-01 1500.0 2000.0 3500.0
如果以下假设是正确的,这可以大大简化:当填充Inject时,油和水总是空的。反之亦然。