<th class="tip" title='manje'>manje</th>
<th class="tip" title='ne d.'>ne d.</th>
<th class="tip" title='manje'>manje</th>
<th class="tip" title='točno'>točno</th>
<th class="tip" title='više'>više</th>
<th class="tip" title='m./t.'>m./t.</th>
<th class="tip" title='v./t.'>v./t.</th>
<th class="tip">daje</th>
<th class="tip">X2</th>
<th class="tip">12</th>
我尝试了一些组合,如果 标记中没有该属性“title”,我只会得到该值。
如果 标记中没有“title”属性,则此模式仅提取内容:
Pattern pattern = Pattern.compile("<th class=\"tip\"[\\s*|[.]{0,20}]>(.*?)\\s*</th>");
这个也是:
Pattern patternType = Pattern.compile("<th class=\"tip\"[\\s*|[.]{0,20}]>(.*?)\\s*</th>");
有什么建议吗? TNX
答案 0 :(得分:5)
正则表达式并非适用于所有情况。改用Jsoup:
package so6235727;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class PrintContent {
private static final String html = //
"<th class=\"tip\" title='manje'>manje</th>\r\n" + //
"<th class=\"tip\" title='ne d.'>ne d.</th>\r\n" + //
"<th class=\"tip\" title='manje'>manje</th>\r\n" + //
"<th class=\"tip\" title='točno'>točno</th>\r\n" + //
"<th class=\"tip\" title='više'>više</th>\r\n" + //
"<th class=\"tip\" title='m./t.'>m./t.</th>\r\n" + //
"<th class=\"tip\" title='v./t.'>v./t.</th>\r\n" + //
"<th class=\"tip\">daje</th>\r\n" + //
"<th class=\"tip\">X2</th>\r\n" + //
"<th class=\"tip\">12</th>\r\n";
public static void main(String[] args) {
Document jsoup = Jsoup.parse(html);
Elements headings = jsoup.select("th.tip");
for (Element element : headings) {
System.out.println(element.text());
}
}
}
看看这有多容易?
答案 1 :(得分:0)
试试这个:
Pattern pattern = Pattern.compile("<th[^>]*>(.*?)\\s*</th>");
答案 2 :(得分:0)
试试这个:
Pattern pattern = Pattern.compile("<th class=\"tip\"[^>]*>(.*)</th>");
答案 3 :(得分:0)
哎呀,还有一个模式回答尝试,这个看起来向前看并向后看:
Pattern pattern = Pattern.compile("(?<=<th .{0,100}>).*(?=</th>)");
编辑1
关于I tried it and it doesn't work in any case
:也许你的背带与我的不同:
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Foo1 {
private static final String FOO_TXT = "Foo1.txt";
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(?<=<th .{0,100}>).*(?=</th>)");
Scanner scan = new Scanner(Foo1.class.getResourceAsStream(FOO_TXT));
while (scan.hasNextLine()) {
String line = scan.nextLine();
System.out.println("Line: " + line);
Matcher match = pattern.matcher(line);
if (match.find()) {
System.out.println("Match: " + match.group());
} else {
System.out.println("No match found");
}
}
}
}
这假定文本文件名为Foo1.txt,并且它与类文件一起定位。
答案 4 :(得分:0)
我包含了我的测试代码,因为当其他人有负/正匹配时,我似乎有正/负匹配。
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex {
public static void test(String patternString) {
System.out.println("Test with pattern: " + patternString);
Pattern pattern = Pattern.compile(patternString);
String[] testStrings = {"<th class=\"tip\" title='manje'>manje</th>", "<th class=\"tip\">daje</th>"};
for (String testString : testStrings) {
System.out.println("> Test on " + testString);
Matcher matcher = pattern.matcher(testString);
if (matcher.matches()) {
System.out.println(">> number of matches in group = " + matcher.groupCount());
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(">>group " + i + " is " + matcher.group(i));
}
} else {
System.out.println(">> no match");
}
}
System.out.println("");
}
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
test("<th class=\"tip\"[\\s*|[.]{0,20}]>(.*?)\\s*</th>"); // op
test("<th[^>]*>(.*?)\\s*</th>"); // Billy Moon
test("<th class=\"tip\"[^>]*>(.*)</th>"); // stuken.yuri
test("(?<=<th .{0,100}>).*(?=</th>)"); // Hovercraft full of Eels
test("(?:<th .{0,100}>).*(?:</th>)");
}
}
我的输出是我得到了Billy Moon和stuken.yuri的比赛,但是没有匹配OP或Hovercraft。我很想知道其他人是否也这样。我在Windows 7中使用Java 7 beta。