熊猫:如何在条件时间范围内选择的窗口中计算值

时间:2019-05-23 08:53:13

标签: python-3.x pandas datetime pandas-groupby

对于每个值(“ X”列),我们都有一个时间范围窗口,定义为['T1','T2']:

df = pd.DataFrame({'X': [1, 1, 1, 3, 3, 3],
                   'T1': ['2015-12-30 22:30:00',
                          '2015-12-30 22:30:15',
                          '2015-12-30 22:30:20',
                          '2015-11-30 23:40:00',
                          '2015-11-30 23:42:00',
                          '2015-11-30 23:40:00'],
                   'T2': ['2015-12-30 22:31:00',
                          '2015-12-30 22:30:45',
                          '2015-12-30 22:30:20',
                          '2015-11-30 23:43:00',
                          '2015-11-30 23:42:50',
                          '2015-11-30 23:44:00'], 
                   })
    X   T1                      T2
0   1   2015-12-30 22:30:00     2015-12-30 22:31:00
1   1   2015-12-30 22:30:15     2015-12-30 22:30:45
2   1   2015-12-30 22:30:20     2015-12-30 22:30:20
3   3   2015-11-30 23:40:00     2015-11-30 23:43:00
4   3   2015-11-30 23:42:00     2015-11-30 23:42:50
5   3   2015-11-30 23:40:00     2015-11-30 23:44:00

我想计算其时间范围在(或等于)为此值定义的时间范围内的相同值(“ X”),所以我想获得下一个结果:

df = pd.DataFrame({'X': [1, 1, 1, 3, 3, 3],
                   'T1': ['2015-12-30 22:30:00',
                          '2015-12-30 22:30:15',
                          '2015-12-30 22:30:20',
                          '2015-11-30 23:40:00',
                          '2015-11-30 23:42:00',
                          '2015-11-30 23:40:00'],
                   'T2': ['2015-12-30 22:31:00',
                          '2015-12-30 22:30:45',
                          '2015-12-30 22:30:20',
                          '2015-11-30 23:43:00',
                          '2015-11-30 23:42:50',
                          '2015-11-30 23:44:00'],
                   'count': [3, 2, 1, 2, 1, 3],
                   })
    X   T1                      T2                      count
0   1   2015-12-30 22:30:00     2015-12-30 22:31:00     3
1   1   2015-12-30 22:30:15     2015-12-30 22:30:45     2
2   1   2015-12-30 22:30:20     2015-12-30 22:30:20     1
3   3   2015-11-30 23:40:00     2015-11-30 23:43:00     2
4   3   2015-11-30 23:42:00     2015-11-30 23:42:50     1
    3   2015-11-30 23:40:00     2015-11-30 23:44:00     3

1 个答案:

答案 0 :(得分:1)

您可以使用apply进行操作。

这里是一种方法:

# Import module
import pandas as pd

# You dataframe
df = pd.DataFrame({ 'X': [1, 1, 1, 3, 3, 3],
                    'T1': ['2015-12-30 22:30:00',
                         '2015-12-30 22:30:15',
                         '2015-12-30 22:30:20',
                         '2015-11-30 23:40:00',
                         '2015-11-30 23:42:00',
                         '2015-11-30 23:40:00'],
                    'T2': ['2015-12-30 22:31:00',
                         '2015-12-30 22:30:45',
                         '2015-12-30 22:30:20',
                         '2015-11-30 23:43:00',
                         '2015-11-30 23:42:50',
                         '2015-11-30 23:44:00'], 
                   })

# Convert columns to datetime object
df["T1"] = pd.to_datetime(df.T1, format="%Y-%m-%d %H:%M:%S")
df["T2"] = pd.to_datetime(df.T2, format="%Y-%m-%d %H:%M:%S")

# Function to comptute the number of element in the interval
def countIntervals(row):
    return len(df[(df.X == row.X) & (df.T1 >= row.T1) & (df.T2 < row.T2)]) + 1

# Add new column
df["count"] = df.apply(countIntervals, axis=1)
print(df)
#    X                  T1                  T2  count
# 0  1 2015-12-30 22:30:00 2015-12-30 22:31:00      3
# 1  1 2015-12-30 22:30:15 2015-12-30 22:30:45      2
# 2  1 2015-12-30 22:30:20 2015-12-30 22:30:20      1
# 3  3 2015-11-30 23:40:00 2015-11-30 23:43:00      2
# 4  3 2015-11-30 23:42:00 2015-11-30 23:42:50      1
# 5  3 2015-11-30 23:40:00 2015-11-30 23:44:00      3