Question

我正在解析URL并将其保存到文件中。我在Windows上运行的代码很好，但在Ubuntu上它在每行的前面加了一点“u”

import re

reports = "C:\Users/_____/Desktop/Reports/"
string = "Here is a string to test.  http://www.blah.com  &  http://2nd.com"
url_match = re.findall(r'(https?://[^\s]+)', string)
print url_match

if url_match != []:
    with open(reports + "_URLs.txt", "a") as text_file:
        text_file.write('{}'.format(url_match).replace(',', "\n").replace('[', '').replace(']', '').replace("'", '').replace(' ', '').__add__("\n"))

有没有人知道如何解决这个问题？感谢

Answer 1

'{}'.format(url_match)将url_match列表转换为人类可读的字符串，然后使用一些错综复杂的字符串替换将其转换为要写入的行列表。沿着这条线的某个地方你得到一个unicode字符串，因此是'u'。我不打算推测为什么会发生这种情况，因为真正的解决方案就是处理列表：

import re

# reports = "C:\Users/_____/Desktop/Reports/"
reports = "" # for test
string = "Here is a string to test.  http://www.blah.com  &  http://2nd.com"
url_match = re.findall(r'(https?://[^\s]+)', string)
print url_match
if url_match:
    with open(reports + "_URLs.txt", "a") as text_file:
        for url in url_match:
            text_file.write(url + '\n')

正确格式化输出到文件

1 个答案: