Question

感谢您的帮助

我希望获得txt文件中每个项目的 ID 和类别，如下所示：

Id:   0
ASIN: 0771044445
discontinued product

Id:   1
ASIN: 0827229534
  title: Patterns of Preaching: A Sermon Sampler
  group: Book
  salesrank: 396585
  similar: 5  0804215715  156101074X  0687023955  0687074231  082721619X
  categories: 2
   |Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Preaching[12368]
   |Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Sermons[12370]
  reviews: total: 2  downloaded: 2  avg rating: 5
    2000-7-28  cutomer: A2JW67OY8U6HHK  rating: 5  votes:  10  helpful:   9
    2003-12-14  cutomer: A2VE83MZF98ITY  rating: 5  votes:   6  helpful:   5

Id:   2
ASIN: 0738700797
  title: Candlemas: Feast of Flames
  group: Book
  salesrank: 168596
  similar: 5  0738700827  1567184960  1567182836  0738700525  0738700940
  categories: 2
   |Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Earth-Based Religions[12472]|Wicca[12484]
......

结果应按以下方式组织：

1 Book

2 Book

3 Book

然后，我编写了一个java程序来提取信息：

class Main
{

  public static void main(String[] args) throws IOException
  { 

    String file="/Users/swing/Desktop/test.rtf";  

      BufferedReader br;

      try 
      {
          br = new BufferedReader(new FileReader(file));

          String line;      

          String re1=".*?"; // Non-greedy match on filler
          String re2="";    // ID 1

          String re3="((?:[c-z][a-z]+))";   // Category 1

          Pattern p = Pattern.compile(re1+re2+re3,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
          Matcher m = p.matcher(file);

          while((line=br.readLine())!=null)
          {
            m=p.matcher(line);

              if (m.find())
              {
                  String id1=m.group(1);
                  String category1=m.group(2);
                  System.out.print(" "+id1.toString()+" "+" "+category1.toString()+" "+"\n");
              }    
          } 
      }  
      catch (FileNotFoundException e)    
      {         
          e.printStackTrace();  
          System.out.println("fail");}   
      }
}

由于我没有使用 java 正则表达式的经验，所以结果错误如下，你能帮我纠正错误的代码吗？谢谢！

输出错误：

\r  tf 

\font  tbl 

color  tbl 

ar  gl 

ardir  natural 

ardir  natural 

AS  IN 

dis  continued 

AS  IN 

tit  le 

gro  up 

ales  rank 

simi  lar 

....

Answer 1

尝试使用此正则表达式以及您选择的选项（意味着dotall和不区分大小写）：

<强>模式

Id:\s+?(\d).+?(?:group:|discontinued)\s(\w+?)\s

<强> INPUT

您在问题中提供的.txt文件

<强>输出

匹配

1. Group 1: 0
   Group 2: product

2. Group 1: 1
   Group 2: Book

3. Group 1: 2
   Group 2: Book

如何为这个txt文件编写java正则表达式

1 个答案: