我正在尝试使用pandas
过滤regular expressions
数据框。
我想删除那些不包含任何字母的行。例如:
Col A.
50000
$927848
dog
cat 583
rabbit 444
我想要的结果是:
Col A.
dog
cat 583
rabbit 444
我一直在尝试使用regex
和pandas
过滤器选项解决此问题失败。见打击。当我尝试合并过滤器的两个条件时,我特意遇到了问题。我怎样才能做到这一点?
选项1:
df['Col A.'] = ~df['Col A.'].filter(regex='\d+')
选项2
df['Col A.'] = df['Col A.'].filter(regex=\w+)
选项3
from string import digits, letters
df['Col A.'] = (df['Col A.'].filter(regex='|'.join(letters)))
OR
df['Col A.'] = ~(df['Col A.'].filter(regex='|'.join(digits)))
OR
df['Col A.'] = df[~(df['Col A.'].filter(regex='|'.join(digits))) & (df['Col A.'].filter(regex='|'.join(letters)))]
答案 0 :(得分:5)
我认为您需要str.contains
通过boolean indexing
过滤包含字母的值:
df = df[df['Col A.'].str.contains('[A-Za-z]')]
print (df)
Col A.
2 dog
3 cat 583
4 rabbit 444
如果有NaN
个值,您可以传递参数:
df = df[df['Col A.'].str.contains('[A-Za-z]', na=False)]
print (df)
Col A.
3 dog
4 cat 583
5 rabbit 444
答案 1 :(得分:2)
你试过了吗?
df['Col A.'].filter(regex=r'\D') # Keeps only if there's a non-digit character
或:
df['Col A.'].filter(regex=r'[A-Za-z]') # Keeps only if there's a letter (alpha)
或:
df['Col A.'].filter(regex=r'[^\W\d_]') # More info in the link below...
答案 2 :(得分:0)
答案 3 :(得分:0)
df['Col A.'].str.contains(r'^\d+$', na=True)
#if只有数字的字符串或者int / float然后会导致NaN转换为True
例如:[50000,'$ 927848','dog','cat 583','rabbit 444','3 e 3','e 3','33','3 e'] 会给 : [真,假,假,假,假,假,假,真,假]