Question

我正在尝试编写一个代码，使用Python3从网站上获取一些数据，正如您从代码中看到的那样：

from bs4 import BeautifulSoup
import urllib.request
import sys
headers={}
headers['User-Agent']="Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36"
req=urllib.request.Request('http://www.cjcyw.com/a/chuanbodangan/2015/0930/47853.html',headers=headers)
resp=urllib.request.urlopen(req)
xml=BeautifulSoup(resp,'html.parser')
x=xml.findAll('dd')
for item in x:
    item=item.text.encode('utf-8')
    print(sys.stdout.buffer.write(item))

结果如下：

result1

当我将这些数据写入txt文件时：

我使用str来调试，真正的问题是弹出：

buggggggg

Answer 1

您可以在此处使用.strings。 strings

from bs4 import BeautifulSoup
import urllib.request
import sys
headers={}
headers['User-Agent']="Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36"            req=urllib.request.Request('http://www.cjcyw.com/a/chuanbodangan/2015/0930/47853.html',headers=headers)
resp=urllib.request.urlopen(req)
xml=BeautifulSoup(resp,'html.parser')
x=xml.findAll('dd')

file = open("4.txt", 'a')
for item in x:
    s = ""
    for string in item.strings:
        s += string
    s += "\n"
    file.write(s)
file.close()

粘贴所有代码。

Answer 2

首先，正如我所说，不要在这里使用sys.stdout.buffer.write，而只需使用f.write(str(item))。

然后，因为Microsoft Windows中文版的默认文件编码是 GBK 。文本的编码似乎是 UTF-8 。因此，您需要以 UTF-8 编码打开文件，如下所示：

open('4.txt', 'a', encoding="utf-8")

尝试运行您的代码。

Python BS4打印写入错误

2 个答案: