我正在尝试从URL中提取数字。这是我试过的代码:
urlss = 'http://www.deyi.com/thread-24488-1-1.html'
urlss = re.sub('http://www.deyi.com/thread-(.*?)-1-1.html', '', urlss)
print(urlss)
我的预期结果是以下数字:
24488
我怎样才能做到这一点?
答案 0 :(得分:2)
re.sub
替换字符串中的内容。您需要使用re.search
来提取子字符串。您可以使用以下正则表达式从网址中提取所需的数字:
'(?<=thread-)\d+'
这个正则表达式将在“thread - ”之后返回第一个连续数字序列的字符串。
例如:
>>> urlss = 'http://www.deyi.com/thread-24488-1-1.html'
>>> import re
>>> re.search('(?<=thread-)\d+', urlss).group()
'24488'
答案 1 :(得分:0)
您可以使用<div style='overflow: scroll;display: block;white-space:nowrap;'>
<table class="point">
<thead>
<tr>
<td>
<label class="sp">point</label>
<label class="pm"> </label><br>
</td>
<td>
<label class="sp">point</label>
<label class="pm"> </label><br>
</td>
</tr>
</thead>
<tbody>
<tr>
<td>
<label>point</label>
<label>point</label>
<label></label>
</td>
<td>
<label>point</label>
<label>point</label>
<label></label>
</td>
</tr>
</tbody>
</table>
<table class="point">
<thead>
<tr>
<td>
<label class="sp">point</label>
<label class="pm"> </label><br>
</td>
<td>
<label class="sp">point</label>
<label class="pm"> </label><br>
</td>
</tr>
</thead>
<tbody>
<tr>
<td>
<label>point</label>
<label>point</label>
<label></label>
</td>
<td>
<label>point</label>
<label>point</label>
<label></label>
</td>
</tr>
</tbody>
</table>
<table class="point">
<thead>
<tr>
<td>
<label class="sp">point</label>
<label class="pm"> </label><br>
</td>
</tr>
</thead>
<tbody>
<tr>
<td>
<label>point</label>
<label>point</label>
<label></label>
</td>
</tr>
</tbody>
</table>
<table class="point">
<thead>
<tr>
<td>
<label class="sp">point</label>
<label class="pm"> </label><br>
</td>
<td>
<label class="sp">point</label>
<label class="pm"> </label><br>
</td>
<td>
<label class="sp">point</label>
<label class="pm"> </label><br>
</td>
</tr>
</thead>
<tbody>
<tr>
<td>
<label>point</label>
<label>point</label>
</td>
<td>
<label>point</label>
<label>point</label>
</td>
<td>
<label>point</label>
<label>point</label>
</td>
</tr>
</tbody>
</table>
</div>
Positive Lookahead (?=(\d+))
输出:
import re
urlss = 'http://www.deyi.com/thread-24488-1-1.html'
pattern='thread-(?=(\d+))'
match=re.search(pattern,urlss)
print(match.group(1))
如果每次url模式只有一些变量或页面相同,那么你可以使用这样的简单模式:
24488
输出:
import re
urlss = 'http://www.deyi.com/thread-24488-1-1.html'
pattern='(\d+){2}'
match=re.search(pattern,urlss)
print(match.group())