感谢您的帮助
我希望获得txt文件中每个项目的 ID 和类别,如下所示:
Id: 0
ASIN: 0771044445
discontinued product
Id: 1
ASIN: 0827229534
title: Patterns of Preaching: A Sermon Sampler
group: Book
salesrank: 396585
similar: 5 0804215715 156101074X 0687023955 0687074231 082721619X
categories: 2
|Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Preaching[12368]
|Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Sermons[12370]
reviews: total: 2 downloaded: 2 avg rating: 5
2000-7-28 cutomer: A2JW67OY8U6HHK rating: 5 votes: 10 helpful: 9
2003-12-14 cutomer: A2VE83MZF98ITY rating: 5 votes: 6 helpful: 5
Id: 2
ASIN: 0738700797
title: Candlemas: Feast of Flames
group: Book
salesrank: 168596
similar: 5 0738700827 1567184960 1567182836 0738700525 0738700940
categories: 2
|Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Earth-Based Religions[12472]|Wicca[12484]
......
结果应按以下方式组织:
1 Book
2 Book
3 Book
然后,我编写了一个java程序来提取信息:
class Main
{
public static void main(String[] args) throws IOException
{
String file="/Users/swing/Desktop/test.rtf";
BufferedReader br;
try
{
br = new BufferedReader(new FileReader(file));
String line;
String re1=".*?"; // Non-greedy match on filler
String re2=""; // ID 1
String re3="((?:[c-z][a-z]+))"; // Category 1
Pattern p = Pattern.compile(re1+re2+re3,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(file);
while((line=br.readLine())!=null)
{
m=p.matcher(line);
if (m.find())
{
String id1=m.group(1);
String category1=m.group(2);
System.out.print(" "+id1.toString()+" "+" "+category1.toString()+" "+"\n");
}
}
}
catch (FileNotFoundException e)
{
e.printStackTrace();
System.out.println("fail");}
}
}
由于我没有使用 java 正则表达式的经验,所以结果错误如下,你能帮我纠正错误的代码吗?谢谢!
输出错误:
\r tf
\font tbl
color tbl
ar gl
ardir natural
ardir natural
AS IN
dis continued
AS IN
tit le
gro up
ales rank
simi lar
....
答案 0 :(得分:0)
尝试使用此正则表达式以及您选择的选项(意味着dotall和不区分大小写):
<强>模式强>
Id:\s+?(\d).+?(?:group:|discontinued)\s(\w+?)\s
<强> INPUT 强>
您在问题中提供的.txt文件
<强>输出强>
匹配
1. Group 1: 0
Group 2: product
2. Group 1: 1
Group 2: Book
3. Group 1: 2
Group 2: Book