根据上一行中的值删除时间索引的Pandas数据框中的n行

时间:2019-10-03 12:48:00

标签: python pandas datetimeindex

正在研究熊猫,需要根据列中的值删除DataFrame中的n个连续行。

在下面的示例中,在17:00:01有一个事件持续2秒钟。在这段时间内,我需要以下2行。在17:00:04还有另一个事件,然后应删除17:00:05行。

不确定如何处理此问题。在lamda中使用遮罩?

Traceback (most recent call last):
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/views/generic/base.py", line 71, in view
    return self.dispatch(request, *args, **kwargs)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/rest_framework/views.py", line 497, in dispatch
    response = self.handle_exception(exc)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/rest_framework/views.py", line 457, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/rest_framework/views.py", line 468, in raise_uncaught_exception
    raise exc
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/rest_framework/views.py", line 494, in dispatch
    response = handler(request, *args, **kwargs)
  File "/home/shawn-codoid/Music/srfi-api/User/views.py", line 69, in post
    form_data.save()  # File Save
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/rest_framework/serializers.py", line 213, in save
    self.instance = self.create(validated_data)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/rest_framework/serializers.py", line 932, in create
    instance = ModelClass._default_manager.create(**validated_data)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/query.py", line 422, in create
    obj.save(force_insert=True, using=self.db)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/base.py", line 741, in save
    force_update=force_update, update_fields=update_fields)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/base.py", line 779, in save_base
    force_update, using, update_fields,
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/base.py", line 870, in _save_table
    result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/base.py", line 908, in _do_insert
    using=using, raw=raw)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/query.py", line 1186, in _insert
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1334, in execute_sql
    for sql, params in self.as_sql():
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1278, in as_sql
    for obj in self.query.objs
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1278, in <listcomp>
    for obj in self.query.objs
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1277, in <listcomp>
    [self.prepare_value(field, self.pre_save_val(field, obj)) for field in fields]
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1218, in prepare_value
    value = field.get_db_prep_save(value, connection=self.connection)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/fields/__init__.py", line 789, in get_db_prep_save
    return self.get_db_prep_value(value, connection=connection, prepared=False)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/fields/__init__.py", line 1273, in get_db_prep_value
    value = self.get_prep_value(value)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/fields/__init__.py", line 1268, in get_prep_value
    return self.to_python(value)
  File "/home/shawn-codoid/Music/virtual/lib/python3.6/site-packages/django/db/models/fields/__init__.py", line 1243, in to_python
    params={'value': value},
django.core.exceptions.ValidationError: ["'' value has an invalid date format. It must be in YYYY-MM-DD format."]
[03/Oct/2019 12:29:15] "POST /api/register/ HTTP/1.1" 500 214929

我有:

t = pd.to_timedelta(df['EventSeconds'], unit='s')
mask = df['2019-01-07 17:00:02' : '2019-01-07 17:00:02' + t]

我需要:

Index               EventSeconds OtherColumn
07/01/2019 16:59:59 0            2
07/01/2019 17:00:00 2            3
07/01/2019 17:00:01 0            4
07/01/2019 17:00:02 0            5
07/01/2019 17:00:03 0            6
07/01/2019 17:00:04 1            7
07/01/2019 17:00:05 0            8
07/01/2019 17:00:06 0            9

1 个答案:

答案 0 :(得分:1)

您可以将持续时间添加到Index中以获得结束时间,但是即使ffill秒,您也需要使用0

t = pd.to_timedelta(df['EventSeconds'], unit='s')

# print end_times to see details    
end_times = (df['Index'].add(t)                   # calculate the end time
                .where(df['EventSeconds'].ne(0))  # mask the starting events
                .ffill()                          # fill the same end times
            )

df[df['Index'].gt(end_times)| df['EventSeconds'].ne(0) ]

输出:

                Index  EventSeconds
0 2019-07-01 16:59:59             0
1 2019-07-01 17:00:00             2
4 2019-07-01 17:00:03             0
5 2019-07-01 17:00:04             1
7 2019-07-01 17:00:06             0