我在java中创建一个程序,我从css类.report
获取html数据
@RequestMapping(value = "/medindiaparser", method = RequestMethod.POST)
public ModelMap medindiaparser(@RequestParam String urlofpage ) throws ClassNotFoundException, IOException {
System.out.println("saveMedicineName");
ModelMap mv = new ModelMap(urlofpage);
System.out.println();
String url = urlofpage;
Document document = Jsoup.connect(url).get();
String TITLE = document.select(".report").text();
String[] news = TITLE.split(":");
System.out.println("Question: " + TITLE);
return mv;
}
现在TITLE
给了我什么。
name : aman kumar working in : home,outside what he does: program | sleep | eat
所以我想在数组中获取特定值,如。
array[0] : aman kumar
array[1] : home,outside
array[2] : program | sleep | eat
那么,我可以在我的模型中设置数组的值,有人做过吗?
.report
包含<h3>
标题所在的位置。它就像这样
<report><h3>Name</h3>aman kumar<h3>working in </h3>home, outside .....</report>
答案 0 :(得分:1)
我完全彻底改变了我的答案,从name
字符串中提取working in
,what he does
和TITLE
内容。这可以使用Java中的正则表达式模式匹配器来完成。
String pattern = "name\\s*:\\s*(.*?)\\s*working in\\s*:\\s*(.*?)\\s*what he does\\s*:\\s*(.*)";
Pattern r = Pattern.compile(pattern);
String line = "name : aman kumar working in : home,outside what he does: program | sleep | eat";
Matcher m = r.matcher(line);
while (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
}
<强>输出:强>
aman kumar
home,outside
program | sleep | eat
在这里演示:
答案 1 :(得分:0)
试试这个:
String s = "name : aman kumar working in : home,outside what he does: program | sleep | eat";
String[] news = s.split(":");
String exclude = "(working in|what he does)";
int index = -1;
for(int i = 0 ; i < news.length ; i++){
if("name".equals(news[i].trim())){
index = i;
break;
}
}
if(index != -1){
String[] content = Arrays.copyOfRange(news, index+1, news.length);
for(String string : content){
System.out.println(string.trim().replaceAll(exclude, ""));
}
}