我下面的数据集中有answer_body
列中的示例文本。我正在准备将数据转换为用于机器学习任务的向量之前的数据,我需要删除各个标签之间的数据。如何删除<code>
标签之间的数据?
<h1> sample text </h1>
<p>I have the following code and want to calculate the time complexity:</p>
<pre><code>def solve(n):
if n == 0 or n == 2:
return True
elif n == 1:
return False
else:
return not solve(n-1) or not solve(n-2) or not solve(n-3)
</code></pre>
<p>If I had something like this:</p>
<pre><code>return solve(n-1) + solve(n-2)
</code></pre>
<p>it would be T(n) = 2T(n-1), at least from my understanding.</p>
<p>But how do I proceed if I have an "or" in the return statement?</p>
<pre><code>return not solve(n-1) or not solve(n-2) or not solve(n-3)
</code></pre>
我尝试过的是
import re
def removeDataBetweenTag(data):
code = str(re.findall(r'<code>(.*?)</code>', data, re.MULTILINE)
question=re.sub('<code>(.*?)</code>', '', data, flags=re.MULTILINE|re.DOTALL)
return str(question.lower())</code>