我有一个这样的数据框:
value1 value2
aa7bbc aaaa
ss ss0
qqq wwww
nn77 qqee
我要删除以下行:
nn
我已经尝试过了:
df[~df.value1.str.contains(r'\d')]
但这不能满足我的所有需求。解决这个问题的最有效方法是什么?
非常感谢您
答案 0 :(得分:1)
您只需要使用OR完善您的正则表达式即可匹配任何条件。
r'(\d)|(^nn)|(^.?$)'
这是:
\d
(包含数字)
OR
^nn
以nn开头
OR
^.?$
表示0-1个字符(少于两个字符)。
尝试一下:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""
value1 value2
aa7bbc aaaa
ss ss0
qqq wwww
nn77 qqee"""), sep=r"\s+")
df = df[~df.value1.str.contains(r'(\d)|(^nn)|(^.?$)')]
print(df)
输出:
value1 value2
1 ss ss0
2 qqq wwww
答案 1 :(得分:1)
使用运算符根据条件进行过滤
Base
答案 2 :(得分:1)
tb <- tibble(
tbx = c(0, 0, 0, 1, 1, 2, 3, 3, 3, 3, 4, 4, 9, 15, 18, 18, 19, 19, 20, 20, 21, 22, 22, 23),
tby = c("g","g","g","g","g","g","g","g","g","g","g","g","t","t","g","g","g","g","g","g","g","g","g","g")
)
ggplot(tb, aes(tbx, tby = ..ndensity..)) +
geom_histogram(bins = 25, aes(fill = tby)) +
scale_fill_manual(values = c("red", "grey"))
答案 3 :(得分:1)
这是一种实现方法:
mask_no_digit =( ~df.value1.str.contains(r'\d')) & (~df.value2.str.contains(r'\d'))
mask_no_nn = (~df['value1'].str.startswith('nn')) & (~df['value2'].str.startswith('nn'))
mask_no_2_characters = (~df['value1'].str.len()<=2 ) & (~df['value2'].str.len()<=2)
df[mask_no_digit & mask_no_nn & mask_no_2_characters]
输出:
value1 value2
2 qqq wwww