我正在解析URL并将其保存到文件中。我在Windows上运行的代码很好,但在Ubuntu上它在每行的前面加了一点“u”
import re
reports = "C:\Users/_____/Desktop/Reports/"
string = "Here is a string to test. http://www.blah.com & http://2nd.com"
url_match = re.findall(r'(https?://[^\s]+)', string)
print url_match
if url_match != []:
with open(reports + "_URLs.txt", "a") as text_file:
text_file.write('{}'.format(url_match).replace(',', "\n").replace('[', '').replace(']', '').replace("'", '').replace(' ', '').__add__("\n"))
有没有人知道如何解决这个问题?感谢
答案 0 :(得分:2)
'{}'.format(url_match)
将url_match
列表转换为人类可读的字符串,然后使用一些错综复杂的字符串替换将其转换为要写入的行列表。沿着这条线的某个地方你得到一个unicode字符串,因此是'u'。我不打算推测为什么会发生这种情况,因为真正的解决方案就是处理列表:
import re
# reports = "C:\Users/_____/Desktop/Reports/"
reports = "" # for test
string = "Here is a string to test. http://www.blah.com & http://2nd.com"
url_match = re.findall(r'(https?://[^\s]+)', string)
print url_match
if url_match:
with open(reports + "_URLs.txt", "a") as text_file:
for url in url_match:
text_file.write(url + '\n')