Question

content_a是一个漂亮的汤结果集（即类型为<class 'bs4.element.ResultSet'>），由类型为<class 'bs4.element.Tag'>的值组成。

如果我打印'content_a'，我会得到：

[<div class="class1 class2">Here is the first sentence.
 <br/> <br/> Here is some text "and some more text."
 <br/> <br/> Here is another sentence.
 <br/> Text<br/><span class="class3">Text</span></div>, <div class="class1 class2">Here is the first sentence.
 <br/> <br/> Here is some text "and some more text."
 <br/> <br/> Here is another sentence.
 <br/> Text<br/><span class="class3">Text</span></div>, etc

所以在我看来它应该是一个简单的可迭代的div列表。

我想用<div class="class1 class2">替换<div class="class1 class2"><p>（我的最终目标是用段落标记替换所有<br />）。

在我的测试中，源内容是我的字符串：

import re
blablabla = ['<div class="class1 class2">', '<div class="class1 class2">']
for _ in blablabla:
    _ = re.sub('(<div class=\"class1 class2\">)', r"\1<p>",_)
    print _

根据需要返回：

<div class="class1 class2"><p>
<div class="class1 class2"><p>

我正在尝试使用以下内容对content_a中的每个iterable执行相同的过程：

import re
for _ in content_a:
    _ = re.sub('(<div class=\"class1 class2\">)', r"\1<p>",_)
    print _

但是我收到了错误：

...in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer

因此，我可以在两个例子之间区分的唯一区别是，一个是美丽的汤结果集，一个只是一个简单的列表。

有人能看出为什么会出现这个错误吗？

修改

有人指出here sub需要一个字符串作为第三个参数，所以我传递的第三个参数是类型为<class 'bs4.element.Tag'>的可迭代值。所以也许这就是问题所在。但我需要保留这些值的性质以便以后修改，所以我不知道如何继续进行。

更新/解决方法：

为了节省花时间在答案上的人，我想出了一个解决方法，基本上我意识到我可以在过程中稍后调整内容，我通过将其转换为read()的字符串来完成此操作然后可以对字符串中的必需元素执行所有re.sub更改。

我提出的小正则表达式是：

string = re.sub('([^\r]*)\r', r'\1</p>\n<p>', string)

Answer 1

正如所建议的那样，我发布了我用作解决方案的解决方法：

<强>更新/解决方法：

为了节省花时间在答案上的人，我想出了一个解决方法，基本上我意识到我可以在过程中稍后调整内容，我通过将其转换为read()的字符串来完成此操作然后可以对字符串中的必需元素执行所有re.sub更改。

我提出的小正则表达式是：

string = re.sub('([^\r]*)\r', r'\1</p>\n<p>', string)

在美丽的汤结果集迭代中执行re.sub时，'TypeError：期望的字符串或缓冲区'

1 个答案: