Question

我需要使用jsoup从html代码下面抓取邮政编码。我只需要邮政编码，它是href标签的a属性的一部分，称为 W2 ：

<a href="/properties-for-sale/w2/chpk3848653" class="property_photo_holder" style="backgroundimage:url(https://assets.foxtons.co.uk/w/480/1523289105/chpk3848653-23.jpg)"></a>

这是html代码：

</div>

<div id="property_1062067" class="property_summary">

<h6><a href="/properties-for-sale/w2/chpk3848653">Lancaster Gate, <span class="property_address_location_name">Bayswater,</span> W2</a></h6>

有人可以帮忙吗？谢谢。

Answer 1

您可以为此使用JSOUP，只需按以下方式检索href属性值：

Document document = Jsoup.connect(URL).userAgent("Mozilla/5.0").get();

Elements elements = document.select("a");

String href = elements.attr("href");

现在您具有href属性作为字符串，您需要应用RegEx（正则表达式）来获取所需的字段，在这种情况下，邮政编码包含在“ / properties-for-sale / w2”中/ chpk3848653”。为此，您将需要：

String regex = "[a-zA-Z0-9]{11}";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(href);

String postalCode = matcher.find().group(0);

仅此而已，如果您还有其他需要，请随时询问！希望这对您有所帮助！

使用jsoup废弃网页

1 个答案: