我正在使用Java 1.7 ...
具有以下JSON响应(来自JSON数组),其中包含不同的标签(一个包含照片,另一个包含视频):
{
"articles":
[
{
"htmlBody": "<asset-entity type=\"photo\" id=\"4806ad76-7433-fs34-50d1-b12bdbc308899ad\"></asset-entity>\r\nAngelie Jolie was seen at Wholefoods with ex-beau Brad Pitt.\r\n
<asset-entity type=\"photo\" id=\"4806fe7d-c175-c380-4ab2-dda068b42b033cbf\"></asset-entity>\r\n- The majority of their kids were with them.\r\n<asset-entity type=\"photo\" id=\"35064086-5d85-1866-4afc-a523c04c2b3e42a6\">
</asset-entity>\r\n"
},
{
"htmlBody": "<asset-entity type=\"video\" id=\"48906fe30-8dx6-7e04-4b18-98c4d77176eaz412\"></asset-entity>\r\n
Reese Witherspoon was spotted at the Paris airport\n\nRumor is that she arrived for the filming of her new movie\n\n <asset-entity type=\"video\" id=\"4207182e-cgga-4e0a-4b97-a5ec0aa619c33b42\"></asset-entity>\r\n"
},
{
"htmlBody": "<asset-entity type=\"photo\" id=\"350686a2-6fef-9fd7-445d-b2888fa56c3454da\"></asset-entity>\r\nMatt Damon was seen walking to StarBucks for a quick latte and chocalate danish while in Boston.\r\nHere's a video clip of him kindly greeting the paparazzi:<asset-entity type=\"video\" id=\"2507f140-ed4c-7e1b-4f44-8c57e051409d6fea\"></asset-entity>\r\n"
}
]
}
在我的Java代码中,htmlBody是一个字符串...
问题:
谁能为我提供一个很好的正则表达式查询(使用Java)来解析:
和:
<asset-entity type=\"photo\" id=\"48906fe30-8dx6-7e04-4b18-98c4d77176eaz412\"></asset-entity>
希望从照片或视频中提取ID并存储到数据结构(例如HashMap)中,但需要能够找到一种机制,该机制将使用正则表达式在我的代码中搜索基于String的htmlBody中的照片和视频。
一旦我将ID存储在正确的数据结构中
例如
Map<String> videoTags = new HashMap();
Map<String> photoTags = new HashMap();
然后,我可以将这些标记替换为实际的标记(或包含实际资产的等效标记。
关于正则表达式或设计的任何建议将不胜感激...如果正则表达式不是在Java中搜索特定的自定义HTML标记(作为String)的可行解决方案,那么我还可以使用(就技术而言) ?
答案 0 :(得分:0)
您可以使用Jsoup来解析html(通过任何属性,标签等)。这是使用Jsoup selectors的示例:
String html = "<asset-entity type=\"photo\" id=\"4806ad76-7433-fs34-50d1-b12bdbc308899ad\">"
+ "</asset-entity>\r\nAngelie Jolie was seen at Wholefoods with ex-beau Brad Pitt.\r\n <asset-entity type=\"photo\" id=\"4806fe7d-c175-c380-4ab2-dda068b42b033cbf\">"
+ "</asset-entity>\r\n- The majority of their kids were with them.\r\n<asset-entity type=\"video\" id=\"35064086-5d85-1866-4afc-a523c04c2b3e42a6\"> </asset-entity>\r\n";
Document doc = Jsoup.parse(html);
Elements elements = doc.select("asset-entity[type=photo]");
for (Element element : elements) {
String type = element.attributes().get("type");
String id = element.attributes().get("id");
System.out.println(type + " " + id);
}
输出
photo 4806ad76-7433-fs34-50d1-b12bdbc308899ad
photo 4806fe7d-c175-c380-4ab2-dda068b42b033cbf