如何使用Java搜索和替换自定义HTML标签?

时间:2018-09-27 09:39:49

标签: java json regex

我正在使用Java 1.7 ...

具有以下JSON响应(来自JSON数组),其中包含不同的标签(一个包含照片,另一个包含视频):

{
    "articles": 
    [
        {
            "htmlBody": "<asset-entity type=\"photo\" id=\"4806ad76-7433-fs34-50d1-b12bdbc308899ad\"></asset-entity>\r\nAngelie Jolie was seen at Wholefoods with ex-beau Brad Pitt.\r\n
                         <asset-entity type=\"photo\" id=\"4806fe7d-c175-c380-4ab2-dda068b42b033cbf\"></asset-entity>\r\n- The majority of their kids were with them.\r\n<asset-entity type=\"photo\" id=\"35064086-5d85-1866-4afc-a523c04c2b3e42a6\">
                         </asset-entity>\r\n"
        },                      
        {
            "htmlBody": "<asset-entity type=\"video\" id=\"48906fe30-8dx6-7e04-4b18-98c4d77176eaz412\"></asset-entity>\r\n
                        Reese Witherspoon was spotted at the Paris airport\n\nRumor is that she arrived for the filming of her new movie\n\n <asset-entity type=\"video\" id=\"4207182e-cgga-4e0a-4b97-a5ec0aa619c33b42\"></asset-entity>\r\n"
        },
        {
            "htmlBody": "<asset-entity type=\"photo\" id=\"350686a2-6fef-9fd7-445d-b2888fa56c3454da\"></asset-entity>\r\nMatt Damon was seen walking to StarBucks for a quick latte and chocalate danish while in Boston.\r\nHere's a video clip of him kindly greeting the paparazzi:<asset-entity type=\"video\" id=\"2507f140-ed4c-7e1b-4f44-8c57e051409d6fea\"></asset-entity>\r\n"
        }
   ]
}

在我的Java代码中,htmlBody是一个字符串...

问题:

  1. 谁能为我提供一个很好的正则表达式查询(使用Java)来解析:

和:

<asset-entity type=\"photo\" id=\"48906fe30-8dx6-7e04-4b18-98c4d77176eaz412\"></asset-entity>

希望从照片或视频中提取ID并存储到数据结构(例如HashMap)中,但需要能够找到一种机制,该机制将使用正则表达式在我的代码中搜索基于String的htmlBody中的照片和视频。

一旦我将ID存储在正确的数据结构中

例如

Map<String> videoTags = new HashMap();
Map<String> photoTags = new HashMap();

然后,我可以将这些标记替换为实际的标记(或包含实际资产的等效标记。

  1. HashMap是存储这些特定资产ID(意图或将其替换为实际资产的硬编码URL)的最佳方法吗?

关于正则表达式或设计的任何建议将不胜感激...如果正则表达式不是在Java中搜索特定的自定义HTML标记(作为String)的可行解决方案,那么我还可以使用(就技术而言) ?

1 个答案:

答案 0 :(得分:0)

您可以使用Jsoup来解析html(通过任何属性,标签等)。这是使用Jsoup selectors的示例:

String html = "<asset-entity type=\"photo\" id=\"4806ad76-7433-fs34-50d1-b12bdbc308899ad\">"
  + "</asset-entity>\r\nAngelie Jolie was seen at Wholefoods with ex-beau Brad Pitt.\r\n <asset-entity type=\"photo\" id=\"4806fe7d-c175-c380-4ab2-dda068b42b033cbf\">"
  + "</asset-entity>\r\n- The majority of their kids were with them.\r\n<asset-entity type=\"video\" id=\"35064086-5d85-1866-4afc-a523c04c2b3e42a6\"> </asset-entity>\r\n";

Document doc = Jsoup.parse(html);
Elements elements = doc.select("asset-entity[type=photo]");
for (Element element : elements) {
  String type = element.attributes().get("type");
  String id = element.attributes().get("id");
  System.out.println(type + " " + id);
}

输出

photo 4806ad76-7433-fs34-50d1-b12bdbc308899ad
photo 4806fe7d-c175-c380-4ab2-dda068b42b033cbf