如何使用Python从给定文本中删除标签之间的数据

时间:2019-05-19 04:47:40

标签: python

我下面的数据集中有answer_body列中的示例文本。我正在准备将数据转换为用于机器学习任务的向量之前的数据,我需要删除各个标签之间的数据。如何删除<code>标签之间的数据?

<h1> sample text </h1>

<p>I have the following code and want to calculate the time complexity:</p>

<pre><code>def solve(n):
    if n == 0 or n == 2:
        return True
    elif n == 1:
        return False
    else:
        return not solve(n-1) or not solve(n-2) or not solve(n-3)
</code></pre>

<p>If I had something like this:</p>

<pre><code>return solve(n-1) + solve(n-2)
</code></pre>

<p>it would be T(n) = 2T(n-1), at least from my understanding.</p>

<p>But how do I proceed if I have an "or" in the return statement?</p>

<pre><code>return not solve(n-1) or not solve(n-2) or not solve(n-3)
</code></pre>

我尝试过的是

import re
    def removeDataBetweenTag(data):
        code = str(re.findall(r'<code>(.*?)</code>', data, re.MULTILINE)
        question=re.sub('<code>(.*?)</code>', '', data, flags=re.MULTILINE|re.DOTALL)
        return str(question.lower())</code>

0 个答案:

没有答案