Question

您好我正在尝试将此字符串拆分（它很长）：

Library Catalogue Log off | Borrower record | Course Reading | Collections | A-Z E-Journal list | ILL Request | Help   Browse | Search | Results List | Previous Searches | My e-Shelf | Self-Issue | Feedback       Selected records:  View Selected  |  Save/Mail  |  Create Subset  |  Add to My e-Shelf  |        Whole set:  Select All  |  Deselect  |  Rank  |  Refine  |  Filter   Records 1 - 15 of 101005 (maximum display and sort is 2500 records)         1 Drower, E. S. (Ethel Stefana), Lady, b. 1879. Lady E.S. Drower’s scholarly correspondence : an intrepid English autodidact in Iraq / edited by 2012. BK Book University Library( 1/ 0) 2 Kowalski, Robin M. Cyberbullying : bullying in the digital age / Robin M. Kowalski, Susan P. Limber, Patricia W. Ag 2012. BK Book University Library( 1/ 0) ...  15 Ambrose, Gavin. Approach and language [electronic resource] / Gavin Ambrose, Nigel Aono-Billson. 2011. BK Book

所以我要么回来了：

1 Drower, E. S. (Ethel Stefana), Lady, b. 1879. Lady E.S. Drower’s scholarly correspondence : an intrepid English autodidact in Iraq / edited by 2012. BK Book University Library( 1/ 0)

// Or

1 Drower, E. S. (Ethel Stefana), Lady, b. 1879. Lady E.S. Drower’s scholarly correspondence : an intrepid English autodidact in Iraq

这只是一个例子，1 Drower，E。S. ...不会是静态的。虽然输入每次都不同（1和2之间的细节），但字符串的总体布局将始终相同。

我有：

String top = ".*         (.*)";
String bottom = "\( \d/ \d\)\W*";
Pattern p = Pattern.compile(top); //+bottom
Matcher matcher = p.matcher(td); //td is the input String
String items = matcher.group();
System.out.println(items);

当我使用top运行它时，它意味着删除所有标题，但我得到的只是No match found。 bottom是我尝试拆分字符串的其余部分。

如果需要，我可以将所有输入发布到15号。我需要的是分割输入字符串，以便我可以处理15个结果中的每个人。

感谢您的帮助！

Answer 1

这将为您提供两种输入。这是你想要的吗？

String text = "Library Catalogue Log off ..."; \\truncated text

Pattern p = Pattern.compile("((1 Drower.+Iraq).+0\\)).+2 Kowalski");
Matcher m = p.matcher(text);
if (m.find()) {
    System.out.println(m.group(1));
    System.out.println(m.group(2));
}

Compile and run code here.

Answer 2

首先，您需要将标题与结果数据分开。假设每次有9个空格块，你可以使用它：.*\s{9}(.*)

接下来，您需要将数据解析为行，这更加困难，因为您没有行分隔符。您可以做的最好的事情是假设行由以下空间分隔：一个空格，然后是一个或多个数字，然后是另一个空格。

((?<=(?:^|\s))\d+\s.*?(?=(?:$|\s\d+\s)))

如果您打算尝试将记录解析为字段，那么除非您可以更改分隔符，否则请不要打扰！

对每一位做什么的一点解释：

(?<=(?:^|\s)) 查看：确保组前面的字符是字符串的开头（第一条记录）或空格（所有其他记录）。

\d+\s.*? 捕获组：一个或多个数字后跟一个空格，然后是文本。由于在断言中使用了非捕获组?:，因此这是表达式中唯一显示在输出中的部分。

(?=(?:$|\s\d+\s)) 向前看：确保组后面的字符是字符串标记$的结尾或后跟1+位的空格，后跟空格（表示下一条记录）。

此方法适用于您提供的字段，但如果您的记录包含自定义分隔符，则会中断此方法，例如一本名为“我喜欢的十件事”的书。还有其他方法可以解析更安全的记录，但如果这就是你想做的事情那么它超出了正则表达式的预期......

Java正则表达式尝试拆分字符串

2 个答案: