我想基于for-loop
中的另一列data
使用df
查找连续的时间段,即时间段(使用开始和结束时间戳记定义),其中data
> 20。
在df
中,timestamp
被用作索引。我认为问题在于,在循环中,我没有正确指定从数据帧的索引列中选择行。
for-loop
:
for i in range(len(df3)):
if i >0:
activities = []
start_time = None
if (df.loc[i, 'data'] >= 20):
if start_time == None:
start_time = df.loc[i, 'timestamp']
else:
if start_time != None:
end_time = df.loc[i-1, 'timestamp']
duration = (end_time - start_time).seconds
activities.append((duration, start_time, end_time))
start_time = None
return activities
df
:
id timestamp data Date sig events
timestamp
2020-01-15 06:12:49.213 40250 2020-01-15 06:12:49.213 20.0 2020-01-15 -1.0 1.0
2020-01-15 06:12:49.313 40251 2020-01-15 06:12:49.313 19.5 2020-01-15 1.0 0.0
2020-01-15 08:05:10.083 40256 2020-01-15 08:05:10.083 20.0 2020-01-15 1.0 0.0
返回:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-11-9853026603d5> in <module>()
9
10
---> 11 if (df.loc[i, 'data'] >= 20):
12
13 if start_time == None:
7 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in _invalid_indexer(self, form, key)
3074 """
3075 raise TypeError(
-> 3076 f"cannot do {form} indexing on {type(self)} with these "
3077 f"indexers [{key}] of {type(key)}"
3078 )
TypeError: cannot do index indexing on <class 'pandas.core.indexes.datetimes.DatetimeIndex'> with these indexers [1] of <class 'int'>
更新:
按照@jcaliz的建议,我尝试了以下代码,并针对不同的变化更改了return
的缩进:
for i in range(len(df)):
if i >0:
activities = []
start_time = None
if (df.iloc[I].data >= 20):
if start_time == None:
start_time = df.iloc[i].timestamp
else:
if start_time != None:
end_time = df.iloc[i-1].timestamp
duration = (end_time - start_time).seconds
activities.append((duration, start_time, end_time))
start_time = None
return activities
但有相同的错误:
File "<ipython-input-24-d78e4605aebe>", line 31
return activities
^
SyntaxError: 'return' outside function
答案 0 :(得分:1)
loc
用于文本,而不是基于整数的索引,而应使用iloc
。更改:
if (df.loc[i, 'data'] >= 20):
收件人
if (df.iloc[i].data >= 20):
对于其他loc
之类的df.loc[i, 'timestamp']
编辑:
更好的方法是不使用for循环
start_time
与timestamp
end_time
是前一个的timestamp
duration
是秒数的差异该过程将是:
# Assign previous record's timestamp as end time
df['end_time'] = df['timestamp'].shift(1)
df['duration'] = df.apply(lambda x: (x['end_time'] -
x['timestamp']).seconds, axis=1)