Question

我正在尝试使用selenium从页面上刮下一些东西。但是这部分文本中有ascii文本..所以我明白了。

    f.write(database_text.text)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 1462: ordinal not in range(128)

我很想知道，无论如何只是简单的ascii？

谢谢！

print("â")

我不打算在我的文本文件中写它，但忽略它。注意：它不仅仅是“它”还有其他类似的字符。

window_before = driver.window_handles[0]

nmber_one = 1
f = open(str(unique_filename) + ".txt", 'w')
for i in range(5, 37):
    time.sleep(3)
    driver.find_element_by_xpath("""/html/body/center/table[2]/tbody/tr[2]/td/table/tbody/tr""" + "[" + str(i) + "]" + """/td[2]/a""").click()
    time.sleep(3)
    driver.switch_to.window(driver.window_handles[nmber_one])
    nmber_one = nmber_one + 1
    database_text = driver.find_element_by_xpath("/html/body/pre")
    f = open(str(unique_filename) + ".txt", 'w',)
    f.write(database_text.text)
    driver.switch_to.window(window_before)



import uuid
import io
unique_filename = uuid.uuid4()

生成一个新的文件名，无论如何它应该在以前工作。

Answer 1

问题是某些文字不是 ascii。 database_text.text可能是unicode文本（您可以print type(database_text.text)进行验证）并包含非英文文本。如果您在Windows上，它可能是“代码页”文本，这取决于您的用户帐户的配置方式。

通常，人们希望将此类文本存储为utf-8，因此请相应地打开输出文件

import io
text = u"â"
with io.open('somefile.txt', 'w', encoding='utf-8') as f:
    f.write(text)

如果您确实想完全删除文件中的非ascii字符，可以设置错误策略

text = u"ignore funky â character"
with io.open('somefile.txt', 'w', encoding='ascii', errors='ignore') as f:
    f.write(text)

最后，您需要选择要用于非ascii（粗略地说，非英语）文本的表示。

Answer 2

Try Except块可以工作：

try:
    f.write(database_text.text)
except UnicodeEncodeError: 
    pass

Python如何“忽略”ascii文本？

2 个答案: