Question

我在编码问题中运行，当响应放在beautifulsoup中时。响应的可读输出以Artikelstandort: Österreich之类的正确方式格式化，但在运行beautifulsoup后，它将转换为Artikelstandort: Ã–sterreich。我会为您提供已更改的代码：

def formTest (browser, formUrl, cardName, edition):
   browser.open (formUrl)

   data = browser.response().read()
   with open ('analyze.txt', 'wb') as textFile:
      print 'wrinting file'
      textFile.write (data)

   #BS4 -> need from_encoding
   soup = BeautifulSoup (data, from_encoding = 'latin-1')
   soup = soup.encode ('latin-1').decode('utf-8')
   table = soup.find('table', { "class" : "MKMTable specimenTable"})

数据包含正确的数据，但汤的编码错误。我在汤上尝试了各种编码/解码，但没有得到任何工作结果。

我从中提取数据的页面是：https://www.magickartenmarkt.de/Mutilate_Magic_2013.c1p256992.prod

修改我像建议的那样用美化来改变编码，但现在我面临以下错误：

TypeError: slice indices must be integers or None or have an __index__ method

美化改变了什么？我绘制了新的输出，表格仍然在“汤”（<table class="MKMTable specimenTable">）

EDIT2：

新错误是：

at：soup.encode ('latin-1').decode('utf-8')

错误：UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 518: invalid start byte

如果我使用编码和解码，则会发生解码其他字节的错误。

Answer 1

你可能现在不需要解决方案，但是如果有人在这里停下来就是你应该做的事情：
您应该在data而不是soup上使用编码过程我通常使用requests库来获取原始响应，然后使用'response.text'之类的语法获取文本内容，然后使用response.encoding='utf-8'强制执行编码。
至少，我将response.text提供给BeautifulSoup()

Python：beautifulsoup的输出有错误的编码

1 个答案: