我想从看起来像这样的文本文件中获取带有正则表达式的ID:
Id: 1
ASIN: 0827229534
title: Patterns of Preaching: A Sermon Sampler
group: Book
salesrank: 396585
similar: 5 0804215715 156101074X 0687023955 0687074231 082721619X
categories: 2
|Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Preaching[12368]
|Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Sermons[12370]
reviews: total: 2 downloaded: 2 avg rating: 5
2000-7-28 cutomer: A2JW67OY8U6HHK rating: 5 votes: 10 helpful: 9
2003-12-14 cutomer: A2VE83MZF98ITY rating: 5 votes: 6 helpful: 5
到目前为止,这是我的代码,但是返回一个空列表,有人可以帮我吗?
import pandas as pd
import re
regex=r'^Id:(\s*\d*)'
textfile = open("amazon-meta.txt", 'r')
filetext = textfile.read()
matches = re.findall(regex, filetext)
matches
答案 0 :(得分:0)
尝试使用flags=re.MULTILINE
例如:
import re
with open(filename, "r") as infile:
print( re.findall(r'^Id:\s*(\d*)', infile.read(), flags=re.MULTILINE))