无法使用regex在python中使用BeautifulSoup拉出whatsapp号

时间:2019-12-18 15:39:34

标签: python regex beautifulsoup

我是python的新手,我正在提取一些数据。我想使用正则表达式提取whatsapp编号。

这是我的代码:

from textwrap import shorten
from bs4 import BeautifulSoup
import json
import requests
import re

url = 'https://m.propertyfinder.ae/en/rent/apartment-for-rent-dubai-dubai-marina-marina-promenade-delphine-tower-7276805.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
all_scripts = soup.find_all('script')
whatsapp_script = all_scripts[6]
whatsapp = re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script)

print(whatsapp.group())

我遇到了类似这样的错误:


Traceback (most recent call last):
  File "/Users/evilslab/Documents/Websites/www.futurepoint.dev.cc/dobuyme/python/fetchFinder.py", line 12, in <module>
    whatsapp = re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/re.py", line 199, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

我如何从源数据中提取whatsapp号?

1 个答案:

答案 0 :(得分:1)

whatsapp_script的类型为bs4.element.Tag。尝试使用其.text属性:

print(re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script.text))

输出

<re.Match object; span=(39851, 40276), match='{"type":"whatsapp","value":"+971566809258","link">

要获取实际数字(来自正则表达式的匹配项),请使用.group(1)属性:

print(re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script.text).group(1))

输出

+971566809258