我正在尝试编写一个从网站中提取信息的小程序。我只想获得两个字符串之间的某些信息,“ORIGIN”和“//”。我没有在代码中出现任何错误,但由于某种原因我无法将信息打印到屏幕上。有人能指出我做错了吗?
import java.io.IOException;
import java.io.PrintStream;
import java.io.FileOutputStream;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.util.regex.*;
class main {
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect("http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?val=293762&db=nuccore&dopt=genbank&extrafeat=976&fmt_mask=0&retmode=html&withmarkup=on&log$=seqview&maxplex=3&maxdownloadsize=1000000").get();
String text = doc.text();
String pattern1 = "ORIGIN";
String pattern2 = "//";
String regexString = Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2);
Pattern pattern = Pattern.compile(regexString, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
String textInBetween = matcher.group(1);
}
Pattern p = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2));
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(m.group(1));
}
}
}
答案 0 :(得分:1)
您需要使用DOTALL
标记来匹配任何可能的换行符
Pattern pattern = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" +
Pattern.quote(pattern2), Pattern.DOTALL);
答案 1 :(得分:0)
您必须使用DOTALL
修饰符编译模式:
Pattern pattern = Pattern.compile(regexString, Pattern.MULTILINE | Pattern.DOTALL);
Pattern p = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2), Pattern.DOTALL);
此修饰符允许句点.
匹配包含新行的每个字符。没有它们,dot匹配除新行之外的每个字符。