Question

我有一个编译为

的模式

pattern_strings = ['\xc2d', '\xa0', '\xe7', '\xc3\ufffdd', '\xc2\xa0', '\xc3\xa7', '\xa0\xa0', '\xc2', '\xe9']
join_pattern = '|'.join(pattern_strings)
pattern = re.compile(join_pattern)

然后我在文件中找到模式

def find_pattern(path):
    with open(path, 'r') as f:
        for line in f:
            print line
            found = pattern.search(line)
            if found:
                print dir(found)
                logging.info('found - ' + found)

我输入的path文件是

\xc2d 
d\xa0 
\xe7 
\xc3\ufffdd 
\xc3\ufffdd 
\xc2\xa0 
\xc3\xa7 
\xa0\xa0 
'619d813\xa03697'

当我运行此程序时，没有任何反应。

我无法捕捉到这些模式，我在这里做错了什么？

期望的输出 - 每一行因为每一行都有一个或另一个匹配模式

更新

将正则表达式更改为

后

pattern_strings = ['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9']

它仍然相同，没有输出

更新

进行正则表达式后

pattern_strings = ['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9']
join_pattern = '[' + '|'.join(pattern_strings) + ']'
pattern = re.compile(join_pattern)

事情开始起作用，但部分仍然没有捕获的模式是行

\xc2\xa0 
\xc3\xa7 
\xa0\xa0

我的模式字符串是['\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0']

Answer 1

在搜索模式中转义\ 使用r"\xa0"或"\\xa0"

这样做....

 ['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9']

除了你听过的那个人之外，每个人都说要做的事情......

Answer 2

你的文件实际上是否包含\xc2d ---即五个字符：反斜杠后跟c，然后是2，然后是d？如果是这样，你的正则表达式就不会匹配它。每个正则表达式都会将一个或两个字符与某些字符代码匹配。如果您想匹配字符串\xc2d，则正则表达式需要为\\xc2d。

python正则表达式：找不到模式

2 个答案: