Question

我是python和regex的新手。我正在尝试处理一个文本文件，我想删除只有数字和空格的行。这是我正在使用的正则表达式。

^\s*[0-9]*\s*$

我能够匹配我想删除的行（在notepad ++ find对话框中）。

但是当我尝试对python执行相同操作时，这些行不匹配。正则表达式本身是否存在问题，或者我的python代码存在问题？

我正在使用的Python代码：

contacts = re.sub(r'^\s*[0-9]*\s*$','\n',contents)

示例文本

Age:30
Gender:Male



20 


Name:संगीता शर्मा
HusbandsName:नरेश कुमार शर्मा
HouseNo:10/183
30 30
Gender:Female


21 

Name:मोनू शर्मा
FathersName:कैलाश शर्मा
HouseNo:10/183
30
Gender:Male

Answer 1

在多线模式下使用re.sub：

contacts = re.sub(r'^\s*([0-9]+\s*)+$','\n',x, flags=re.M)

Demo

如果您想要开始^和$个锚点，那么您希望处于多线模式。

此外，使用以下内容表示仅包含数字簇的行，可能用空格分隔：

^\s*([0-9]+\s*)+$

Answer 2

你甚至不需要正则表达式，一个简单的str.translate()来删除你不感兴趣的字符并检查是否还剩下一些东西应该足够了：

import string

clear_chars = string.digits + string.whitespace  # a map of characters we'd like to check for

# open input.txt for reading, out.txt for writing
with open("input.txt", "rb") as f_in, open("output.txt", "wb") as f_out:
    for line in f_in:  # iterate over the input file line by line
        if line.translate(None, clear_chars):  # remove the chars, check if anything is left
            f_out.write(line)  # write the line to the output file
        # uncomment the following if you want added newlines when pattern matched
        # else:
        #     f_out.write("\n")  # write a new line on match

将为您的样本输入生成：

Age:30
Gender:Male
Name:संगीता शर्मा
HusbandsName:नरेश कुमार शर्मा
HouseNo:10/183
Gender:Female
Name:मोनू शर्मा
FathersName:कैलाश शर्मा
HouseNo:10/183
Gender:Male

如果您希望将匹配的行替换为新行，只需取消注释else子句。

删除只有数字的行 - 正则表达式

2 个答案:

Demo