Question

以下是我的信息：

输入：

 \"button\" \"button\" href=\"#\"   data-id=\"11111111111\"  \"button\" \"button\" href=\"#\"   data-id=\"222222222222\"
     \"button\" \"button\" href=\"#\"

我想要的输出：

11111111111
222222222222

我的第一段代码运作良好：

text = 'data-id=\"11111111111 \" data-id=\"222222222222\" '
c = re.findall('data-id=\\"(.*?)\\"', text)

我的第二个代码不起作用。它没有显示任何内容

with open("E:/test.txt","r") as f:
    text = f.readline()

c = re.findall('data-id=\\"(.*?)\\"', text)

为什么我的辅助代码不起作用。请帮我修理一下。我非常感谢你。谢谢:)）

Answer 1

你可以这样做：

"([^\\]+)

"与\匹配，然后捕获的组合包含所需的部分，即子串到下一个\\"，\\"确保该部分后跟{{1} }}

示例：

In [34]: s Out[34]: 'randomtext data-id=\\"11111111111\\" randomtext data-id=\\"222222222222\\"' In [35]: re.findall(r'"([^\\]+)\\"', s) Out[35]: ['11111111111', '222222222222']

回答编辑过的问题：

使用\d+匹配数字：

re.findall(r'"(\d+)\\"', s)

基于ID进行匹配：

re.findall(r'data-id=\\"([^\\]+)\\"', s)

示例：

In [45]: s Out[45]: '\\"button\\" \\"button\\" href=\\"#\\" data-id=\\"11111111111\\" \\"button\\" \\"button\\" href=\\"#\\" data-id=\\"222222222222\\" \\"button\\" \\"button\\" href=\\"#\\"' In [50]: re.findall(r'"(\d+)\\"', s) Out[50]: ['11111111111', '222222222222'] In [46]: re.findall(r'data-id=\\"([^\\]+)\\"', s) Out[46]: ['11111111111', '222222222222']

Answer 2

请检查此答案。（在str_txt.txt文件中添加了两行）。

只有我在第二段代码中所做的更改才是，＆＃39; r＆＃39;作为正则表达式中的前缀。有关＆＃39; r＆＃39;的更多信息正则表达式中的前缀，请check here !!!

import re
with open("str_txt.txt","r") as f:
    text = f.readlines()
for line in text:
    c=[]
    c = re.findall(r'data-id=\\"(.*?)\\"', line)
    print c

输出：

C:\Users\dinesh_pundkar\Desktop>python demo.Py
['11111111111', '222222222222']
['1111113434111', '222222222222']

Python 3：从文件中读取文本时，正则表达式不起作用

2 个答案: