Question

我如何过滤出一系列数据（在pandas dataFrame中），我不希望第一个字母为'Z'或任何其他字符。

我有以下pandas dataFrame df（其中> 25,000行）。

TIME_STAMP  Activity    Action  Quantity    EPIC    Price   Sub-activity    Venue
0   2017-08-30 08:00:05.000 Allocation  BUY 50  RRS 77.6    CPTY    066
1   2017-08-30 08:00:05.000 Allocation  BUY 50  RRS 77.6    CPTY    066
3   2017-08-30 08:00:09.000 Allocation  BUY 91  BATS    47.875  CPTY    PXINLN
4   2017-08-30 08:00:10.000 Allocation  BUY 43  PNN 8.07    CPTY    WCAPD
5   2017-08-30 08:00:10.000 Allocation  BUY 270 SGE 6.93    CPTY    PROBDMAD

我正在尝试删除“会场”的第一个字母为“ Z”的所有行。

例如，我通常的过滤器代码类似于（过滤掉Venue ='066'的所有行

df = df[df.Venue != '066']

我可以看到此过滤器行按数组过滤出我需要的内容，但是我不确定如何在过滤器上下文中指定它。

[k for k in df.Venue if 'Z' not in k]

Answer 1

使用str[0]来选择第一个值，或使用startswith，contains和正则表达式^来开始字符串。对于非有效布尔型面罩，请使用~：

df1 = df[df.Venue.str[0] != 'Z']

df1 = df[~df.Venue.str.startswith('Z')]

df1 = df[~df.Venue.str.contains('^Z')]

如果没有NaN的值更快，请使用列表理解：

df1 = df[[x[0] != 'Z' for x in df.Venue]]

df1 = df[[not x.startswith('Z') for x in df.Venue]]

Answer 2

对于不具有POST liveindex/_search { "from": 0, "size": 50, "query": { "bool": { "must": { "query_string": { "fields": [ "user_Name" ], "default_operator": "AND", "query": "neel" } }, "should": { "prefix": { "user_Name.sort": "neel" } } } }, "sort": { "_score": { "order": "desc" } } }值的情况，您可以将系列的NumPy表示形式转换为NaN类型并测试相等性：

'<U1'

性能基准测试

df1 = df[df['A'].values.astype('<U1') != 'Z']

筛选大熊猫行，其中列中的第一个字母/不是特定值

2 个答案:

性能基准测试