熊猫:回顾过去

时间:2021-05-31 08:30:52

标签: python pandas

我正在查看车辆的速度,我拥有的唯一数据是速度稳定、减速或停止(参见下面的 df)。还有一个(加速),但是在当前的df中没有找到这个。

如您所见,有 2 个“减速”期。我只对停止前最后一个“减速”期开始的数据感兴趣。

如何过滤数据,以便删除我不感兴趣的前 x 行?由于速度值总是不同的,我不能简单地过滤值。

希望能帮到你!

import pandas as pd

data = {
  "Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19", "2020-06-07 00:20"],

  "Values": ["Stable","Stable","Stable","Stable", "Slowing down","Slowing down","Slowing down","Stable", "Stable", "Stable", "Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down", "Stopped", "Stopped", "Stopped"]
}

df = pd.DataFrame(data)

df.head()

3 个答案:

答案 0 :(得分:1)

您可以使用 .cumsum() 获得减速期的序列,然后通过 .loc 过滤,其中 Values 等于 Slowing down,并且新创建的序列是最大值:

df['SlowDownSeq'] = df['Values'].ne(df['Values'].shift()).cumsum()
df_selected = df.loc[df['SlowDownSeq'] ==  df.loc[df['Values'] == 'Slowing down', 'SlowDownSeq'].max()].drop('SlowDownSeq', axis=1)

结果:

print(df_selected)


         Date and Time        Values
10 2020-06-07 00:10:00  Slowing down
11 2020-06-07 00:11:00  Slowing down
12 2020-06-07 00:12:00  Slowing down
13 2020-06-07 00:13:00  Slowing down
14 2020-06-07 00:14:00  Slowing down
15 2020-06-07 00:15:00  Slowing down
16 2020-06-07 00:16:00  Slowing down
17 2020-06-07 00:17:00  Slowing down

答案 1 :(得分:1)

因此,根据我的理解,您需要 'Value''Slowing down' 的行,后面紧跟 'Stopped'

*注意:我明白我误解了。您不仅需要最后一行,还需要开始 "Slowing down" 序列的所有先前连续行。我仍会保留此解决方案,但看起来 SeaBean 已满足您的需求。

您可以做的是创建另一列,我将其命名为 'Next_Value',即向上移动 1 行。然后您可以进行查询/过滤并找到具有 'Value' == 'Slowing down''Next_Value' == 'Stopped'

的行
import pandas as pd

data = {
  "Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19", "2020-06-07 00:20"],

  "Values": ["Stable","Stable","Stable","Stable", "Slowing down","Slowing down","Slowing down","Stable", "Stable", "Stable", "Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down", "Stopped", "Stopped", "Stopped"]
}

df = pd.DataFrame(data)

df['Next_Value'] = df['Values'].shift(-1)

filtered_df = df.query('Values == "Slowing down" and Next_Value == "Stopped"')

如果您更熟悉此语法而不是 df.query(),请使用以下行:

filtered_df = df[(df['Values'] == "Slowing down") & (df['Next_Value'] == "Slowing down")]

输出:

print(filtered_df)
       Date and Time        Values Next_Value
17  2020-06-07 00:17  Slowing down    Stopped

答案 2 :(得分:0)

import pandas as pd

data = {
  "Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19", "2020-06-07 00:20"],

  "Values": ["Stable","Stable","Stable","Stable", "Slowing down","Slowing down","Slowing down","Stable", "Stable", "Stable", "Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down", "Stopped", "Stopped", "Stopped"]
}

df = pd.DataFrame(data)

df["slow_count"] = df.groupby("Values").cumcount()

a = df[(df["slow_count"] == df["slow_count"].max()) & (df["Values"] == "Slowing down" )]


相关问题