如何从BeautifulSoup获取文本方法中删除换行符

时间:2016-10-05 09:30:30

标签: python beautifulsoup

我在抓取网页后有以下输出

       text
Out[50]: 
['\nAbsolute FreeBSD, 2nd Edition\n',
'\nAbsolute OpenBSD, 2nd Edition\n',
'\nAndroid Security Internals\n',
'\nApple Confidential 2.0\n',
'\nArduino Playground\n',
'\nArduino Project Handbook\n',
'\nArduino Workshop\n',
'\nArt of Assembly Language, 2nd Edition\n',
'\nArt of Debugging\n',
'\nArt of Interactive Design\n',]

我需要在迭代它时从上面的列表中删除\ n。以下是我的代码

text = []
for name in web_text:
   a = name.get_text()
   text.append(a)

4 个答案:

答案 0 :(得分:1)

就像你strip任何其他字符串一样:

text = []
for name in web_text:
   a = name.get_text().strip()
   text.append(a)

答案 1 :(得分:1)

不要显式调用.strip(),而是使用strip参数:

a = name.get_text(strip=True)

如果有的话,这也会删除子文本中的额外空格和换行符。

答案 2 :(得分:0)

您可以使用列表理解:

stripedText = [ t.strip() for t in text ]

哪个输出:

>>> stripedText
['Absolute FreeBSD, 2nd Edition', 'Absolute OpenBSD, 2nd Edition', 'Android Security Internals', 'Apple Confidential 2.0', 'Arduino Playground', 'Arduino Project Handbook', 'Arduino Workshop', 'Art of Assembly Language, 2nd Edition', 'Art of Debugging', 'Art of Interactive Design']

答案 3 :(得分:0)

以下方法可帮助您在迭代时从上面的列表中删除\ n。

>>> web_text = ['\nAbsolute FreeBSD, 2nd Edition\n',
... '\nAbsolute OpenBSD, 2nd Edition\n',
... '\nAndroid Security Internals\n',
... '\nApple Confidential 2.0\n',
... '\nArduino Playground\n',
... '\nArduino Project Handbook\n',
... '\nArduino Workshop\n',
... '\nArt of Assembly Language, 2nd Edition\n',
... '\nArt of Debugging\n',
... '\nArt of Interactive Design\n',]

>>> text = []
>>> for line in web_text:
...     a = line.strip()
...     text.append(a)
...
>>> text
['Absolute FreeBSD, 2nd Edition', 'Absolute OpenBSD, 2nd Edition', 'Android 
Security Internals', 'Apple Confidential 2.0', 'Arduino Playground', 
'Arduino Project Handbook', 'Arduino Workshop', 'Art of Assembly Language, 
2nd Edition', 'Art of Debugging', 'Art of Interactive Design']