Question

我的html liks：

<dl class="resume_pro">  
    <dt>    <h3>personal infomation</h3>  </dt>  
    <dd class="pro_lf"> 
        <span class="rt_title">sex:male | age:26 </span>
        <div class="clear"></div> 
        <br>phone:123456789<a href="###" class="send" id="sendsms" style="display:none">send message</a><br>   E-mail：name@abc.com <br>  
    </dd>
    <div class="clear"></div>
</dl>

我的解析器代码：

var $ = cheerio.load(html);
found = $('*:contains("phone:")').last();

找到的内容将获得“<dd class="pro_lf"> </dd>”

然后found.text（）将获得“sex:male | age:26 phone:123456789send message E-mail：name@abc.com”

但我怎样才能收到每部手机和电子邮件？

我想写一个常用的代码

所以我只是用$('*:contains("phone:")')来搜索我的信息，而不是使用标签名称或类名

我将循环元素以找出每个最后一个节点并获取解析器的内容

我需要一些帮助。

Answer 1

可能有一千种方法可以做到这一点，但这里有一种使用正则表达式的简洁方法（其中我不是主人，但这是我的看法）：

var $ = cheerio.load(html);
found = $('*:contains("phone:")').last();

//Find phone number
var phoneNumber = str.match(/phone\:\d+/)[0].match(/\d+$/);

match将找到字符串"phone:123456789"并将其返回到只包含一个元素的数组中。然后我们将显示"phone:"的字符串拆分，留下数组["", "123456789"]。

扩展RegEx /phone\:\d+/：

/                   start of regex
 phone\:            match the string literal, "phone:"
 \d+                match 1 or more digits following "phone:"
/                   end of regex

对于/\d+$/：

/                   start of regex
 \d+                match 1 or more digits
 $                  ...at the end of the string
/                   end of regex

运行此功能后，phoneNumber将成为字符串"123456789"。

Answer 2

我应该用它来遍历每个元素：

found.contents().each(function() {
.....
}

然后我可以在循环中使用正则表达式来获取电话号码。

如何通过cheerio解析我的电话号码

2 个答案: