Question

我是python的新手，我正在提取一些数据。我想使用正则表达式提取whatsapp编号。

这是我的代码：

from textwrap import shorten
from bs4 import BeautifulSoup
import json
import requests
import re

url = 'https://m.propertyfinder.ae/en/rent/apartment-for-rent-dubai-dubai-marina-marina-promenade-delphine-tower-7276805.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
all_scripts = soup.find_all('script')
whatsapp_script = all_scripts[6]
whatsapp = re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script)

print(whatsapp.group())

我遇到了类似这样的错误：


Traceback (most recent call last):
  File "/Users/evilslab/Documents/Websites/www.futurepoint.dev.cc/dobuyme/python/fetchFinder.py", line 12, in <module>
    whatsapp = re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/re.py", line 199, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

我如何从源数据中提取whatsapp号？

Answer 1

whatsapp_script的类型为bs4.element.Tag。尝试使用其.text属性：

print(re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script.text))

输出

<re.Match object; span=(39851, 40276), match='{"type":"whatsapp","value":"+971566809258","link">

要获取实际数字（来自正则表达式的匹配项），请使用.group(1)属性：

print(re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script.text).group(1))

输出

+971566809258

无法使用regex在python中使用BeautifulSoup拉出whatsapp号

1 个答案: