由于BeautifulSoup没有拾取span类或section类标签,因此我很难从此页面打印文本。我想从Motley Fool中提取文本,然后按句子进行分析。
到目前为止,当偶尔插入文本时,句子解析仍然有效,但是,精美的汤仅偶尔插入文本。
from textblob import TextBlob
from html.parser import HTMLParser
import re
def news():
# the target we want to open
url = dataframe_url
#open with GET method
resp=requests.get(url)
#http_respone 200 means OK status
if resp.status_code==200:
soup = BeautifulSoup(resp.text,"html.parser")
#l = soup.find("span",attrs={'class':"article-content"})
l = soup.find("section",attrs={'class':"usmf-new article-body"})
#print ('\n-----\n'.join(tokenizer.tokenize(l.text)))
textlist.extend(tokenizer.tokenize(l.text))
else:
print("Error")
答案 0 :(得分:0)
为了捕获成绩单,您可以尝试执行以下操作-并进行修改以满足您的需求:
import requests
from bs4 import BeautifulSoup as bs
with requests.Session() as s:
response = s.get('https://www.fool.com/earnings/call-transcripts/2019/04/26/exxon-mobil-corp-xom-q1-2019-earnings-conference-c.aspx')
soup = bs(response.content, 'lxml')
heads = soup.find_all('h2')
selections = ['Prepared Remarks:','Questions and Answers:']
for selection in selections:
for head in heads:
if head.text == selection:
for elem in head.findAllNext():
if elem.name != 'script':
print(elem.text)
if 'Duration' in elem.text:
break
让我知道它是否足够近。