我正在使用漂亮的汤从ul和li标签中提取数据。我可以得到一个日期,但是缺少一些单词,并且行之间没有空格。
<li>Developing <span class="bte bte-78432-940"> </span>pricing strategy that maximizes profits <span class="bte bte-78432-947"> </span>market share <span class="bte bte-78432-962"> </span>considers customer satisfaction</li>
<li>Supporting <span class="bte bte-78432-1041"> </span>and <span class="bte bte-78432-1045"> </span>launching</li>
HTML查看文字: -制定定价策略,以最大化利润和市场份额,但要考虑客户满意度 -支持销售和服务启动
我收到以下文字: 制定最大化利润市场份额的定价策略时要考虑客户满意度支持和启动
缺少诸如a和and的字样,即销售和服务。另外,它们连续一行地写。
如何获得与HTML视图中相同的文本,如果没有提示符,则每个子弹之间至少应包含下划线。
代码段:
soup = BeautifulSoup(html, 'html.parser')
ul_jobdetail = soup.find_all('ul',{'class':'job-detail-req'})
i=1
for ul_jdetail in ul_jobdetail:
if i==1:
duties = ul_jdetail.getText()
print(ul_jdetail.text)
else:
requirements=ul_jdetail.getText()
i=i+1
答案 0 :(得分:2)
该页面似乎是通过CSS编码的,因此首先加载该CSS,解析为所需信息(缺少单词),然后将这些单词放入汤中:
import re
import requests
from bs4 import BeautifulSoup
url = 'https://www.bongthom.com/job_detail/various_positions_78432.html'
soup = BeautifulSoup(requests.get(url).text, 'lxml')
css_url = soup.select_one('link[data-src="escape"]')['href']
for css_class, word in re.findall(r'\.(bte-\d+-\d+).*?"(.*?)"', requests.get(css_url).text):
for span in soup.select('span.{}'.format(css_class)):
span.string = word + ' '
span.unwrap()
for li in soup.select('.job-detail-req li'):
print(li.text)
打印:
Developing a pricing strategy that maximizes profits and market share but considers customer satisfaction
Supporting sale and service launching
Creating promotion, advertising and event planning
Developing and managing advertising campaigns
Organizing company conference, Trade shows, and major events
Building brand awareness
Evaluating and maintaining marketing strategy
Directing, planning and coordinating marketing plan
Researching market demand
Handling social media, public relation efforts, and marketing content
Build strategic relationships and partner with key industry players, and agencies
Be in charge of marketing budget and allocate
Up-to-date with the latest trends and best practices in online marketing and measurement
Identify weaknesses in existing marketing campaigns and develop pragmatic solution within budgetary constraints
Communicate with senior management about marketing initiatives and brainstorm fresh strategies
Bachelor degree in Marketing, Business Administration, Communication or relate field (MBA Preferred)
At least five years’ experience in Marketing and Promotion
...etc.