答案 0 :(得分:1)
您需要使用正则表达式来解析从BeautifulSoup获得的<P>
块:
import re
text_from_p = """
some text
some more
Tel: 0234-234345-45
some more text
"""
match = re.search(r"Tel: (?P<tel>[0-9\- ]*)", text_from_p)
if match:
print(match.group("tel"))
else:
print("Tel not found")
您得到:
0234-234345-45
答案 1 :(得分:1)
您可以使用re
模块来解析文本。
例如:
import re
import requests
from bs4 import BeautifulSoup
url = 'https://www.forpressrelease.com/forpressrelease/553538/4/china-leading-cabinet-handles-supplier-rochehandle-celebrates-success-of-entering-european-market'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
txt = soup.select_one('.single_page_content').get_text(strip=True, separator='\n')
company = re.findall(r'Company:\s*(.*)', txt)[0]
address = re.findall(r'Address:\s*(.*)', txt)[0]
contact = re.findall(r'Contact:\s*(.*)', txt)[0]
email = re.findall(r'Email:\s*(.*?)\s*(?=\w+:)', txt, flags=re.S)[0]
tel = re.findall(r'Tel:\s*(.*)', txt)[0]
mob = re.findall(r'Mob:\s*(.*)', txt)[0]
url = re.findall(r'Url\s*:\s*-\s*(.*)', txt, flags=re.S)[0]
print('{:<15}: {}'.format('Company', company))
print('{:<15}: {}'.format('Address', address))
print('{:<15}: {}'.format('Contact', contact))
print('{:<15}: {}'.format('Email', email))
print('{:<15}: {}'.format('Tel', tel))
print('{:<15}: {}'.format('Mob', mob))
print('{:<15}: {}'.format('Url', url))
打印:
Company : Dongguan Roche Industrial Co., Ltd
Address : No.83, XiZheng 1st Road, Shajiao Community, Humen Town, Dongguan City, Guangdong Province, China 523936
Contact : Robin Luo
Email : info@rochehandle.com
Tel : 0769-89366747
Mob : +86-13392706499
Url : https://www.rochehandle.com