Question

这是我的代码，但它打印整段。如何打印第一个句子，直到第一个点？

from bs4 import BeautifulSoup
import urllib.request,time

article = 'https://www.theguardian.com/science/2012/\
oct/03/philosophy-artificial-intelligence'

req = urllib.request.Request(article, headers={'User-agent': 'Mozilla/5.0'})
html = urllib.request.urlopen(req).read()

soup = BeautifulSoup(html,'lxml')

def print_intro():
    if len(soup.find_all('p')[0].get_text()) > 100:
        print(soup.find_all('p')[0].get_text())

此代码打印：

说明人类的大脑具有某种能力尊重，远远优于其他所有已知物体宇宙是没有争议的。大脑是唯一的一种物体能够理解宇宙是在那里，或为什么在那里是无限多的素数，或苹果因为的而下降时空曲率，或服从其天生的本能可以在道德上是错误的，或者它本身存在。它也不是独一无二的能力局限于这些大脑问题。冷酷的物质事实是它是唯一可以推进自身的物体空间和背部没有伤害，或预测和防止流星罢工本身，或酷的物体到绝对值的十亿分之一零，或在银河距离内探测其他类型。

但我只想打印：

说明人类的大脑具有某种能力尊重，远远优于其他所有已知物体宇宙是没有争议的。

感谢您的帮助

Answer 1

拆分该点上的文字;对于单个拆分，使用str.partition()比str.split()更快且限制为：

text = soup.find_all('p')[0].get_text()
if len(text) > 100:
    text = text.partition('.')[0] + '.'
print(text)

如果您只需要处理第一个 <p>元素，请改用soup.find()：

text = soup.find('p').get_text()
if len(text) > 100:
    text = text.partition('.')[0] + '.'
print(text)

但是，对于您的指定网址，示例文本位于第二个段落中：

>>> soup.find_all('p')[1]
<p><span class="drop-cap"><span class="drop-cap__inner">T</span></span>o state that the human brain has capabilities that are, in some respects, far superior to those of all other known objects in the cosmos would be uncontroversial. The brain is the only kind of object capable of understanding that the cosmos is even there, or why there are infinitely many prime numbers, or that apples fall because of the curvature of space-time, or that obeying its own inborn instincts can be morally wrong, or that it itself exists. Nor are its unique abilities confined to such cerebral matters. The cold, physical fact is that it is the only kind of object that can propel itself into space and back without harm, or predict and prevent a meteor strike on itself, or cool objects to a billionth of a degree above absolute zero, or detect others of its kind across galactic distances.</p>
>>> text = soup.find_all('p')[1].get_text()
>>> text.partition('.')[0] + '.'
'To state that the human brain has capabilities that are, in some respects, far superior to those of all other known objects in the cosmos would be uncontroversial.'

Answer 2

split第一段period处的段落。参数1将MAXSPLIT与def print_intro(): if len(soup.find_all('p')[0].get_text()) > 100: my_paragraph = soup.find_all('p')[0].get_text() my_list = my_paragraph.split('.', 1) print(my_list[0])进行对比，从而节省您不必要的额外拆分时间。

optionsView.render

Answer 3

您可以使用find('.')，它会返回您首先要查找的内容的索引。

因此，如果段落存储在名为paragraph

的变量中

sentence_index = paragraph.find('.')
# add the '.'
sentence += 1
print(paragraph[0: sentence_index])

显然，这里缺少控制部分，例如检查paragraph变量中包含的字符串是否包含＆＃39;。＆＃39;等等。无论如何find（）如果找不到你正在寻找的子字符串则返回-1。

Python：打印/获取每个段落的第一句话

4 个答案: