Question

我得到了代码：

<div class="content">
xxxxxx
</div>

我尝试使用re moudle来匹配我的代码：

pattern=re.compile(r'<div class="content">(.*?)</div>')
items=re.findall(pattern,raw_data)#raw_data is html code
print(items)

输出：

items=[]

我的代码错了。我怎么能解决这个错误？

Answer 1

这听起来很简单，但你绝对应该使用HTML Parser进行HTML解析。

使用BeautifulSoup的示例：

from bs4 import BeautifulSoup

soup = BeautifulSoup(raw_data)
content = soup.find("div", class_="content")
print(content.text)

Answer 2

检查您的raw_data，因为这项工作正常：

import re

raw_data='<div class="content">xxxxx</div>'

pattern=re.compile(r'<div class="content">(.*?)</div>')
items=re.findall(pattern,raw_data)
print(items)

输出：

[ 'XXXXX']

Answer 3

你缺少换行符＆amp;正则表达式中的空白字符：

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://steamcommunity.com/market/priceoverview/?country=US&currency=5&appid=570&market_hash_name=Gem%20of%20Taegeuk");
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);   
$curl = curl_exec($ch);

如何使用re模块匹配HTML代码

3 个答案: