我是python的新手,我正在提取一些数据。我想使用正则表达式提取whatsapp编号。
这是我的代码:
from textwrap import shorten
from bs4 import BeautifulSoup
import json
import requests
import re
url = 'https://m.propertyfinder.ae/en/rent/apartment-for-rent-dubai-dubai-marina-marina-promenade-delphine-tower-7276805.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
all_scripts = soup.find_all('script')
whatsapp_script = all_scripts[6]
whatsapp = re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script)
print(whatsapp.group())
我遇到了类似这样的错误:
Traceback (most recent call last):
File "/Users/evilslab/Documents/Websites/www.futurepoint.dev.cc/dobuyme/python/fetchFinder.py", line 12, in <module>
whatsapp = re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/re.py", line 199, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object
我如何从源数据中提取whatsapp号?
答案 0 :(得分:1)
whatsapp_script
的类型为bs4.element.Tag
。尝试使用其.text
属性:
print(re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script.text))
输出
<re.Match object; span=(39851, 40276), match='{"type":"whatsapp","value":"+971566809258","link">
要获取实际数字(来自正则表达式的匹配项),请使用.group(1)
属性:
print(re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script.text).group(1))
输出
+971566809258