删除熊猫数据框中具有特殊值的行

时间:2019-07-01 11:48:45

标签: python python-3.x pandas dataframe

我有一个这样的数据框:

value1       value2
aa7bbc       aaaa
ss           ss0
qqq          wwww
nn77         qqee

我要删除以下行:

  • 具有数字
  • nn
  • 开始
  • 少于两个字符

我已经尝试过了:

df[~df.value1.str.contains(r'\d')]

但这不能满足我的所有需求。解决这个问题的最有效方法是什么?

非常感谢您

4 个答案:

答案 0 :(得分:1)

您只需要使用OR完善您的正则表达式即可匹配任何条件。

r'(\d)|(^nn)|(^.?$)'

这是:

\d(包含数字)

OR

^nn以nn开头

OR

^.?$表示0-1个字符(少于两个字符)。

尝试一下:

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO("""
value1       value2
aa7bbc       aaaa
ss           ss0
qqq          wwww
nn77         qqee"""), sep=r"\s+")

df = df[~df.value1.str.contains(r'(\d)|(^nn)|(^.?$)')]

print(df)

输出:

  value1 value2
1     ss    ss0
2    qqq   wwww

答案 1 :(得分:1)

使用运算符根据条件进行过滤

Base

答案 2 :(得分:1)

  tb <- tibble(
  tbx = c(0, 0, 0, 1, 1, 2, 3, 3, 3, 3, 4, 4, 9, 15, 18, 18, 19, 19, 20, 20, 21, 22, 22, 23),
  tby = c("g","g","g","g","g","g","g","g","g","g","g","g","t","t","g","g","g","g","g","g","g","g","g","g")
)


ggplot(tb, aes(tbx, tby = ..ndensity..)) +
  geom_histogram(bins = 25, aes(fill = tby)) +
  scale_fill_manual(values = c("red", "grey"))

答案 3 :(得分:1)

这是一种实现方法:

mask_no_digit =( ~df.value1.str.contains(r'\d')) & (~df.value2.str.contains(r'\d'))
mask_no_nn = (~df['value1'].str.startswith('nn')) & (~df['value2'].str.startswith('nn'))
mask_no_2_characters = (~df['value1'].str.len()<=2 ) & (~df['value2'].str.len()<=2)

df[mask_no_digit & mask_no_nn & mask_no_2_characters]

输出:

  value1 value2
2    qqq   wwww