Question

我有空间和时间df：

'date'        'spatial_pixel'   'column_A'   ...
 ----             -----          ---          
 2012-04-01   |   1000     |      5
 2012-04-01   |   1001     |      1
 ...              ...            ...

我想要一个列（按'spatial_pixel'和'date'分组），它会计算满足布尔值的行数。说'column_A'＆lt; 2：

'date'        'spatial_pixel'   'column_A'   'days-in-a-row'   ...
 ----             -----          ---           ----
 2012-03-30   |   1001     |      5    |         0
 2012-04-01   |   1001     |      1    |         1
 2012-04-02   |   1001     |      1    |         2
 2012-04-03   |   1001     |      3    |         0
 ...              ...            ...            ...

我的尝试：

首先，我创建了一个新的数据框，当布尔值为True（'column_A'＆lt; 2）时，写入月度日数（例如1,2,3，...... 28,29,30）。（但是，我需要它的范围从1-365，因此月末和月初很容易被识别为连续）。

'date'        'spatial_pixel'   'column_A'   'day'   ...
 ----             -----          ---           ----
 2012-03-30   |   1001     |      5    |         NaN
 2012-04-01   |   1001     |      1    |         1
 2012-04-02   |   1001     |      1    |         2
 2012-04-03   |   1001     |      3    |         NaN
 2012-04-30   |   1001     |      1    |         30
 2012-04-31   |   1001     |      1    |         31     
 ...              ...            ...            ...

第二，

我尝试使用@ZJS中的修改代码Pandas: conditional rolling count创建一个计算连续月份天数的新列，但未成功。

def rolling_count(val):
    if val == rolling_count.previous + 1 :
        rolling_count.count +=1
    else:
        rolling_count.previous = val
        rolling_count.count = 1
    return rolling_count.count
rolling_count.count = 0 #static variable
rolling_count.previous = None #static variable

df['count'] == df.groupby(['spatial_pixel','date'])['day'].apply(rolling_count)                             


KeyError: 'count'

非常感谢任何帮助！

Answer 1

IIUYC，这是我对这个问题的看法：

import pandas as pd from datetime import datetime df = pd.DataFrame( [ [datetime(2016, 1, 1), 1000, 5], [datetime(2016, 1, 1), 1001, 1], [datetime(2016, 1, 2), 1000, 1], [datetime(2016, 1, 2), 1001, 1], [datetime(2016, 1, 3), 1000, 1], [datetime(2016, 1, 3), 1001, 5], [datetime(2016, 1, 4), 1000, 1], [datetime(2016, 1, 4), 1001, 1], ], columns=['date', 'spatial_pixel', 'column_A'] ) df # date spatial_pixel column_A # 0 2016-01-01 1000 5 # 1 2016-01-01 1001 1 # 2 2016-01-02 1000 1 # 3 2016-01-02 1001 1 # 4 2016-01-03 1000 1 # 5 2016-01-03 1001 5 # 6 2016-01-04 1000 1 # 7 2016-01-04 1001 1 def sum_days_in_row_with_condition(g): sorted_g = g.sort_values(by='date', ascending=True) condition = sorted_g['column_A'] < 2 sorted_g['days-in-a-row'] = condition.cumsum() - condition.cumsum().where(~condition).ffill().astype(int) return sorted_g (df.groupby('spatial_pixel') .apply(sum_days_in_row_with_condition) .reset_index(drop=True)) # date spatial_pixel column_A days-in-a-row # 0 2016-01-01 1000 5 0 # 1 2016-01-02 1000 1 1 # 2 2016-01-03 1000 1 2 # 3 2016-01-04 1000 1 3 # 4 2016-01-01 1001 1 1 # 5 2016-01-02 1001 1 2 # 6 2016-01-03 1001 5 0 # 7 2016-01-04 1001 1 1

计算符合特定条件的时间序列

1 个答案: