Question

问题：我无法使用Beautiful Soup 4用换行符替换 标签。

代码：我的程序（程序的相关部分）当前看起来像

for br in board.select('br'):
    br.replace_with('\n')

但是我也尝试用board.find_all()代替board.select()。

结果：当我使用board.replace_with('\n')时，所有 标签都被字符串文字\n取代。例如，Hello world最终将变成Hello\nworld。使用board.replace_with(\n)会导致错误

File "<ipython-input-27-cdfade950fdf>", line 10
    br.replace_with(\n)
                       ^
SyntaxError: unexpected character after line continuation character

其他信息：如果有相关性，我正在使用Jupyter笔记本。这是我的完整课程，因为在其他地方我可能忽略了一些问题。

import requests
from bs4 import BeautifulSoup
import pandas as pd

page = requests.get("https://boards.4chan.org/g/")
soup = BeautifulSoup(page.content, 'html.parser')
board = soup.find('div', class_='board')

for br in board.select('br'):
    br.replace_with('\n')

message = [obj.get_text() for obj in board.select('.opContainer .postMessage')]
image = [obj['href'] for obj in board.select('.opContainer .fileThumb')]
pid = [obj.get_text() for obj in board.select('.opContainer .postInfo .postNum a[title="Reply to this post"]')]
time = [obj.get_text() for obj in board.select('.opContainer .postInfo .dateTime')]

for x in range(len(image)):
    image[x] = "https:" + image[x]

post = pd.DataFrame({
    "ID": pid,
    "Time": time,
    "Image": image,
    "Message": message,
    })
post

pd.options.display.max_rows
pd.set_option('display.max_colwidth', -1)

display(post)

任何建议将不胜感激。感谢您的阅读。

Answer 1

在转换成汤后，而不是替换，而是尝试替换 标签。喜欢，

soup = BeautifulSoup(str(page.content).replace(' ', '\n'), 'html.parser')

希望这会有所帮助！干杯！

P.S .：我没有任何逻辑上的理由，说为什么在改成汤后这种方法不起作用。

Answer 2

尝试

的变体后

select sysdate()+1 from safetyplan;

在两个小时的大部分时间内，我确定Panda数据框将换行符打印为字符串文字。其他所有内容均表明该程序正在按预期工作，因此我认为这一直都是问题所在。

Answer 3

只需尝试一下，它对我有用，我的bs4版本是4.8.0，我正在使用Python 3.5.3，例如：

from bs4 import BeautifulSoup

soup = BeautifulSoup('hello<br>world')

for br in soup('br'):
    br.replace_with('\n')

# <br> was replaced with \n successfully
assert str(soup) == '<html><body><p>hello\nworld</p></body></html>'

# get_text() also works as expected
assert soup.get_text() == 'hello\nworld' 

# it is a \n not a \\n 
assert soup.get_text() != 'hello\\nworld'

我不习惯使用Jupyter Notebook，但是您的问题似乎是，无论您用来可视化数据的方式是向您显示字符串表示形式，而不是实际打印字符串，希望这可以帮助，问候，亚行

使用bs4

3 个答案: