Question

我想从看起来像这样的文本文件中获取带有正则表达式的ID：

Id:   1
ASIN: 0827229534
  title: Patterns of Preaching: A Sermon Sampler
  group: Book
  salesrank: 396585
  similar: 5  0804215715  156101074X  0687023955  0687074231  082721619X
  categories: 2
   |Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Preaching[12368]
   |Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Sermons[12370]
  reviews: total: 2  downloaded: 2  avg rating: 5
    2000-7-28  cutomer: A2JW67OY8U6HHK  rating: 5  votes:  10  helpful:   9
    2003-12-14  cutomer: A2VE83MZF98ITY  rating: 5  votes:   6  helpful:   5

到目前为止，这是我的代码，但是返回一个空列表，有人可以帮我吗？

import pandas as pd
import re
regex=r'^Id:(\s*\d*)'
textfile = open("amazon-meta.txt", 'r')
filetext = textfile.read()
matches = re.findall(regex, filetext)
matches

Answer 1

尝试使用flags=re.MULTILINE

例如：

import re
with open(filename, "r") as infile:
    print( re.findall(r'^Id:\s*(\d*)', infile.read(), flags=re.MULTILINE))

如何从txt文件中获取带有正则表达式的ID？

1 个答案: