正确格式化输出到文件

时间:2015-11-19 23:54:56

标签: python

我正在解析URL并将其保存到文件中。我在Windows上运行的代码很好,但在Ubuntu上它在每行的前面加了一点“u”

import re

reports = "C:\Users/_____/Desktop/Reports/"
string = "Here is a string to test.  http://www.blah.com  &  http://2nd.com"
url_match = re.findall(r'(https?://[^\s]+)', string)
print url_match

if url_match != []:
    with open(reports + "_URLs.txt", "a") as text_file:
        text_file.write('{}'.format(url_match).replace(',', "\n").replace('[', '').replace(']', '').replace("'", '').replace(' ', '').__add__("\n"))

enter image description here

有没有人知道如何解决这个问题?感谢

1 个答案:

答案 0 :(得分:2)

'{}'.format(url_match)url_match列表转换为人类可读的字符串,然后使用一些错综复杂的字符串替换将其转换为要写入的行列表。沿着这条线的某个地方你得到一个unicode字符串,因此是'u'。我不打算推​​测为什么会发生这种情况,因为真正的解决方案就是处理列表:

import re

# reports = "C:\Users/_____/Desktop/Reports/"
reports = "" # for test
string = "Here is a string to test.  http://www.blah.com  &  http://2nd.com"
url_match = re.findall(r'(https?://[^\s]+)', string)
print url_match
if url_match:
    with open(reports + "_URLs.txt", "a") as text_file:
        for url in url_match:
            text_file.write(url + '\n')