Question

我开发了一个简单的程序，它向波斯网络服务器发送请求并获取主页面的源代码。然后我将其转换为字符串，使用file.open (new_file , 'w')并将字符串粘贴到其中。

当我在python idle中使用print字符串时，我可以在波斯语中看到正确的单词，但我在目录中创建的文本文件是用\xd9\x8a\xd8\xb9\n这样的字符串写的。

以下是代码：

import urllib.request as ul
import sys

url = 'http://www.uut.ac.ir/'
resp = ul.urlopen(url).read()
string = str(resp)
create_file(filename , string)   # this function creates a text file in desktop

我也用过：

file.open(new_file , 'w' , encoding = 'utf-8')
string = resp.encode('utf-8')

但没有改变。任何帮助将不胜感激。

Answer 1

在写入文件

之前解码网站内容

import urllib.request as ul
import sys

url = 'http://www.uut.ac.ir/'
resp = ul.urlopen(url).read()

string = str(resp.decode())

f=open("a.txt",'w')
f.write(string)

Answer 2

看看你的代码：

>>> resp = ul.urlopen(url).read()
>>> type(resp)
<class 'bytes'>

resp的类型为bytes。接下来你使用了：

string = str(resp)

但是你忘了设置编码了。正确的命令是：

string = str(resp, encoding="utf-8")

现在您可以获得正确的字符串并将其直接写入您的文件。

您的解决方案2是错误的。您必须使用decode代替encode。

string = resp.decode('utf-8')

以可以在python中读取的方式将波斯文本写入文本文件

2 个答案: