我有一个HTML代码段,如下所示:
<code class="inline">\n object.__getattribute__\n </code>\n and\n <code class="inline">\n super.__getattribute__\n </code>\n peek\nin the\n <code class="inline">\n __dict__\n </code>\n of classes on the MRO for a class when looking for\nan attribute. This PEP adds an optional\n <code class="inline">\n __getdescriptor__\n </code>\n method to\na metaclass that replaces this behavior and gives more control over attribute\nlookup, especially when using a\n \n super\n </a>\n\n \n </a>\n object.\n </p>\n<p>\n That is, the MRO walking loop in\n
问题
如何仅定位\n
代码中的<code>
?
我尝试了什么
我尝试使用re.sub()
方法,但我一直在替换所有内容而不仅仅是\n
代码
答案 0 :(得分:2)
由于输入是HTML,为什么不使用专门的工具 - HTML解析器。
以下是有关如何找到所有code
代码并使用BeautifulSoup
HTML parser将\n
替换为空字符串的示例:
from bs4 import BeautifulSoup
data = """<code class="inline">\n object.__getattribute__\n </code>\n and\n <code class="inline">\n super.__getattribute__\n </code>\n peek\nin the\n <code class="inline">\n __dict__\n </code>\n of classes on the MRO for a class when looking for\nan attribute. This PEP adds an optional\n <code class="inline">\n __getdescriptor__\n </code>\n method to\na metaclass that replaces this behavior and gives more control over attribute\nlookup, especially when using a\n \n super\n </a>\n\n \n </a>\n object.\n </p>\n<p>\n That is, the MRO walking loop in\n"""
soup = BeautifulSoup(data, "html.parser")
for code in soup("code"):
code.string = code.string.replace("\n", "")
print(soup)
答案 1 :(得分:1)
text = '<code class="inline">\n object.__getattribute__\n </code>\n and\n <code class="inline">\n super.__getattribute__\n </code>\n peek\nin the\n <code class="inline">\n __dict__\n </code>\n of classes on the MRO for a class when looking for\nan attribute. This PEP adds an optional\n <code class="inline">\n __getdescriptor__\n </code>\n method to\na metaclass that replaces this behavior and gives more control over attribute\nlookup, especially when using a\n \n super\n </a>\n\n \n </a>\n object.\n </p>\n<p>\n That is, the MRO walking loop in\n '
print(text.replace('\n',''))