对于每个值(“ X”列),我们都有一个时间范围窗口,定义为['T1','T2']:
df = pd.DataFrame({'X': [1, 1, 1, 3, 3, 3],
'T1': ['2015-12-30 22:30:00',
'2015-12-30 22:30:15',
'2015-12-30 22:30:20',
'2015-11-30 23:40:00',
'2015-11-30 23:42:00',
'2015-11-30 23:40:00'],
'T2': ['2015-12-30 22:31:00',
'2015-12-30 22:30:45',
'2015-12-30 22:30:20',
'2015-11-30 23:43:00',
'2015-11-30 23:42:50',
'2015-11-30 23:44:00'],
})
X T1 T2
0 1 2015-12-30 22:30:00 2015-12-30 22:31:00
1 1 2015-12-30 22:30:15 2015-12-30 22:30:45
2 1 2015-12-30 22:30:20 2015-12-30 22:30:20
3 3 2015-11-30 23:40:00 2015-11-30 23:43:00
4 3 2015-11-30 23:42:00 2015-11-30 23:42:50
5 3 2015-11-30 23:40:00 2015-11-30 23:44:00
我想计算其时间范围在(或等于)为此值定义的时间范围内的相同值(“ X”),所以我想获得下一个结果:
df = pd.DataFrame({'X': [1, 1, 1, 3, 3, 3],
'T1': ['2015-12-30 22:30:00',
'2015-12-30 22:30:15',
'2015-12-30 22:30:20',
'2015-11-30 23:40:00',
'2015-11-30 23:42:00',
'2015-11-30 23:40:00'],
'T2': ['2015-12-30 22:31:00',
'2015-12-30 22:30:45',
'2015-12-30 22:30:20',
'2015-11-30 23:43:00',
'2015-11-30 23:42:50',
'2015-11-30 23:44:00'],
'count': [3, 2, 1, 2, 1, 3],
})
X T1 T2 count
0 1 2015-12-30 22:30:00 2015-12-30 22:31:00 3
1 1 2015-12-30 22:30:15 2015-12-30 22:30:45 2
2 1 2015-12-30 22:30:20 2015-12-30 22:30:20 1
3 3 2015-11-30 23:40:00 2015-11-30 23:43:00 2
4 3 2015-11-30 23:42:00 2015-11-30 23:42:50 1
3 2015-11-30 23:40:00 2015-11-30 23:44:00 3
答案 0 :(得分:1)
您可以使用apply
进行操作。
这里是一种方法:
# Import module
import pandas as pd
# You dataframe
df = pd.DataFrame({ 'X': [1, 1, 1, 3, 3, 3],
'T1': ['2015-12-30 22:30:00',
'2015-12-30 22:30:15',
'2015-12-30 22:30:20',
'2015-11-30 23:40:00',
'2015-11-30 23:42:00',
'2015-11-30 23:40:00'],
'T2': ['2015-12-30 22:31:00',
'2015-12-30 22:30:45',
'2015-12-30 22:30:20',
'2015-11-30 23:43:00',
'2015-11-30 23:42:50',
'2015-11-30 23:44:00'],
})
# Convert columns to datetime object
df["T1"] = pd.to_datetime(df.T1, format="%Y-%m-%d %H:%M:%S")
df["T2"] = pd.to_datetime(df.T2, format="%Y-%m-%d %H:%M:%S")
# Function to comptute the number of element in the interval
def countIntervals(row):
return len(df[(df.X == row.X) & (df.T1 >= row.T1) & (df.T2 < row.T2)]) + 1
# Add new column
df["count"] = df.apply(countIntervals, axis=1)
print(df)
# X T1 T2 count
# 0 1 2015-12-30 22:30:00 2015-12-30 22:31:00 3
# 1 1 2015-12-30 22:30:15 2015-12-30 22:30:45 2
# 2 1 2015-12-30 22:30:20 2015-12-30 22:30:20 1
# 3 3 2015-11-30 23:40:00 2015-11-30 23:43:00 2
# 4 3 2015-11-30 23:42:00 2015-11-30 23:42:50 1
# 5 3 2015-11-30 23:40:00 2015-11-30 23:44:00 3