我正在分析来自不同传感器的数据。传感器在使用时变为活动状态(1)。但是,我只需要第一次和最后一次激活的时间(和日期),但不需要中间的任何时间。找到后,我需要创建一个新的DataFrame,其中包含第一个和最后一个出现的时间和日期,以及用户'和'活动'。
我试图遍历每一行并构建一系列if-then语句,但没有运气。 我想知道是否有一个熊猫功能可以让我有效地做到这一点? 这是我数据的一个子集。
我刚刚开始了解大熊猫的情况,所以我们非常感谢任何帮助。
喝彩!
import pandas as pd
cols=['User', 'Activity', 'Coaster1', 'Coaster2', 'Coaster3',
'Coaster4', 'Coaster5', 'Coffee', 'Door', 'Fridge', u'coldWater',
'hotWater', 'SensorDate', 'SensorTime', 'RegisteredTime']
data=[['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0,
0.0, 0.0, '2015-09-21', '13:05:54', '13:09:00'],
['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0,
0.0, 0.0, '2015-09-21', '13:05:54', '13:09:00'],
['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0,
0.0, 0.0, '2015-09-21', '13:05:55', '13:09:00'],
['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0,
0.0, 0.0, '2015-09-21', '13:05:55', '13:09:00'],
['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0,
0.0, 0.0, '2015-09-21', '13:05:56', '13:09:00'],
['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0,
0.0, 0.0, '2015-09-21', '13:05:56', '13:09:00'],
['Chris', 'coffee + hot water', 0, 1.0, 0.0, 0, 0, 0.0, 0.0, 0.0,
0.0, 0.0, '2015-09-21', '13:05:58', '13:09:00'],
['Chris', 'coffee + hot water', 0, 1.0, 0.0, 0, 0, 0.0, 0.0, 0.0,
0.0, 0.0, '2015-09-21', '13:05:59', '13:09:00']]
df=pd.DataFrame(data,columns=cols)
所需的输出如下:
data_out=[['Chris','coffee + hot water','0','0','0','0','0','0','1','0','0','0','2015-09-21','13:05:54','13:05:56','13:09:00'],['Chris','coffee + hot water','0','1','0','0','0','0','0','0','0','0','2015-09-21','13:05:58','13:05:59','13:09:00']]
cols_out=['User',
'Activity',
'Coaster1',
'Coaster2',
'Coaster3',
'Coaster4',
'Coaster5',
'Coffee',
'Door',
'Fridge',
u'coldWater',
'hotWater',
'SensorDate',
'SensorTimeFirst',
'SensorTimeLast',
'RegisteredTime']
df_out=pd.DataFrame(data_out, columns=cols_out)
答案 0 :(得分:0)
def f(x):
Doormin = x[x['Door'] == 1].min()
Doormax = x[x['Door'] == 1].max()
Coaster2min = x[x['Coaster2'] == 1].min()
Coaster2max = x[x['Coaster2'] == 1].max()
Coaster1min = x[x['Coaster1'] == 1].min()
Coaster1max = x[x['Coaster1'] == 1].max()
Door = pd.Series([Doormin['Door'], Doormin['SensorDate'], Doormin['SensorTime'], Doormax['SensorTime'], Doormin['RegisteredTime']], index=['Door','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime'])
Coaster1 = pd.Series([Coaster1min['Coaster1'], Coaster1min['SensorDate'], Coaster1min['SensorTime'], Coaster1max['SensorTime'], Coaster1min['RegisteredTime']], index=['Coaster1','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime'])
Coaster2 = pd.Series([Coaster2min['Coaster2'], Coaster2min['SensorDate'], Coaster2min['SensorTime'], Coaster2max['SensorTime'], Coaster2min['RegisteredTime']], index=['Coaster2','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime'])
return pd.DataFrame([Door, Coaster2, Coaster1])
print df.groupby(['User','Activity']).apply(f)
Coaster1 Coaster2 Door RegisteredTime \
User Activity
Chris coffee + hot water 0 NaN NaN 1 13:09:00
1 NaN 1 NaN 13:09:00
2 NaN NaN NaN NaN
SensorDate SensorTimeFirst SensorTimeLast
User Activity
Chris coffee + hot water 0 2015-09-21 13:05:54 13:05:56
1 2015-09-21 13:05:58 13:05:59
2 NaN NaN NaN
也许您可以fillna
添加0
而不是NaN
:
df = df.groupby(['User','Activity']).apply(f)
df[['Coaster1','Coaster2','Door']] = df[['Coaster1','Coaster2','Door']].fillna(0)
print df
Coaster1 Coaster2 Door RegisteredTime \
User Activity
Chris coffee + hot water 0 0 0 1 13:09:00
1 0 1 0 13:09:00
2 0 0 0 NaN
SensorDate SensorTimeFirst SensorTimeLast
User Activity
Chris coffee + hot water 0 2015-09-21 13:05:54 13:05:56
1 2015-09-21 13:05:58 13:05:59
2 NaN NaN NaN
答案 1 :(得分:0)
您可以使用以下功能。您将获得所有项目的频率。
data.value_counts()