我正在编写一个可以使用Python 3从Stack Overflow下载问题的程序。现在我完成了,这是代码:
import os
import re
import urllib.request
req = urllib.request.Request('https://stackoverflow.com/questions/32535816/use-for-loop-inside-another-for-in-python-3')
req.add_header("user-agent", "Mozilla/5.0 (X11; Linux x86_64)\
AppleWebKit/537.36 (KHTML, like Gecko)\
Chrome/45.0.2454.93 Safari/537.36")
html = urllib.request.urlopen(req)
webpage = html.read().decode('utf-8')
text = re.search(r'<div class="post-text" itemprop="text">.+?</div>',
webpage, re.S)
with open('text', 'w') as f:
for i in text.group():
f.write(i)
输出结果为:
<div class="post-text" itemprop="text">
<p>I'm trying to print a file in rainbow colors. But however I have a problem, here is my code:</p>
<pre><code>color = [91, 93, 92, 96, 94, 95]
with open(sys.argv[1]) as f:
for i in f.read():
for c in color:
print('\033[{0}m{1}\033[{0};m'
.format(c, i), end='', flush=True)
</code></pre>
<p>the question is, I want the output like this: <code>Hello</code>(<code>H</code> in red, <code>e</code> in yellow, etc. )</p>
<p>but I got the output like this:<code>HHHHHeeeeellll...</code>(first <code>H</code> in red, second <code>H</code> in yello, etc.)</p>
<p>I know that because the first <code>for</code> will loop the second <code>for</code>. So how can I solve this?</p>
</div>
我认为它工作得很好,但我想删除所有HTML标记。我尝试过这样使用re.sub
:
text = re.sub('<.+?>', '', text)
但我收到了这个错误:
Traceback (most recent call last):
File "1.py", line 18, in <module>
text = re.sub('<.+?>', '', text)
File "/usr/lib/python3.4/re.py", line 179, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer
这是什么意思,我该如何解决?