这是我的代码,但它打印整段。如何打印第一个句子,直到第一个点?
from bs4 import BeautifulSoup
import urllib.request,time
article = 'https://www.theguardian.com/science/2012/\
oct/03/philosophy-artificial-intelligence'
req = urllib.request.Request(article, headers={'User-agent': 'Mozilla/5.0'})
html = urllib.request.urlopen(req).read()
soup = BeautifulSoup(html,'lxml')
def print_intro():
if len(soup.find_all('p')[0].get_text()) > 100:
print(soup.find_all('p')[0].get_text())
此代码打印:
说明人类的大脑具有某种能力 尊重,远远优于其他所有已知物体 宇宙是没有争议的。大脑是唯一的一种物体 能够理解宇宙是在那里,或为什么在那里 是无限多的素数,或苹果因为的而下降 时空曲率,或服从其天生的本能可以 在道德上是错误的,或者它本身存在。它也不是独一无二的 能力局限于这些大脑问题。冷酷的物质事实 是它是唯一可以推进自身的物体 空间和背部没有伤害,或预测和防止流星罢工 本身,或酷的物体到绝对值的十亿分之一 零,或在银河距离内探测其他类型。
但我只想打印:
说明人类的大脑具有某种能力 尊重,远远优于其他所有已知物体 宇宙是没有争议的。
感谢您的帮助
答案 0 :(得分:3)
拆分该点上的文字;对于单个拆分,使用str.partition()
比str.split()
更快且限制为:
text = soup.find_all('p')[0].get_text()
if len(text) > 100:
text = text.partition('.')[0] + '.'
print(text)
如果您只需要处理第一个 <p>
元素,请改用soup.find()
:
text = soup.find('p').get_text()
if len(text) > 100:
text = text.partition('.')[0] + '.'
print(text)
但是,对于您的指定网址,示例文本位于第二个段落中:
>>> soup.find_all('p')[1]
<p><span class="drop-cap"><span class="drop-cap__inner">T</span></span>o state that the human brain has capabilities that are, in some respects, far superior to those of all other known objects in the cosmos would be uncontroversial. The brain is the only kind of object capable of understanding that the cosmos is even there, or why there are infinitely many prime numbers, or that apples fall because of the curvature of space-time, or that obeying its own inborn instincts can be morally wrong, or that it itself exists. Nor are its unique abilities confined to such cerebral matters. The cold, physical fact is that it is the only kind of object that can propel itself into space and back without harm, or predict and prevent a meteor strike on itself, or cool objects to a billionth of a degree above absolute zero, or detect others of its kind across galactic distances.</p>
>>> text = soup.find_all('p')[1].get_text()
>>> text.partition('.')[0] + '.'
'To state that the human brain has capabilities that are, in some respects, far superior to those of all other known objects in the cosmos would be uncontroversial.'
答案 1 :(得分:0)
[0, 9]
答案 2 :(得分:0)
split
第一段period
处的段落。参数1
将MAXSPLIT
与def print_intro():
if len(soup.find_all('p')[0].get_text()) > 100:
my_paragraph = soup.find_all('p')[0].get_text()
my_list = my_paragraph.split('.', 1)
print(my_list[0])
进行对比,从而节省您不必要的额外拆分时间。
optionsView.render
答案 3 :(得分:0)
您可以使用find('.')
,它会返回您首先要查找的内容的索引。
因此,如果段落存储在名为paragraph
sentence_index = paragraph.find('.')
# add the '.'
sentence += 1
print(paragraph[0: sentence_index])
显然,这里缺少控制部分,例如检查paragraph
变量中包含的字符串是否包含&#39;。&#39;等等。无论如何find()如果找不到你正在寻找的子字符串则返回-1。