Question

我不完全确定我需要对此错误做些什么。我认为它与需要添加.encode（'utf-8'）有关。但我不完全确定这是我需要做的，也不应该在哪里应用。

错误是：

function sendRequest($url)
{
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

    /*curl_setopt($ch, CURLOPT_HTTPHEADER, array(
        'GET '.$url.' HTTP/1.1', // Are you sure about this?
        'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3',
        'Accept: text/html',
        'Accept-Language: ru,en-us;',
        'Accept-Charset: windows-1251,utf-8;',
        'Connection: close'
    ));*/

    $contents = curl_exec($ch);
    curl_close($ch);

    return $contents;
}

function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0)
{
    $result = false;

    $contents = sendRequest($url);

    // Check if we need to go somewhere else

    if (isset($contents) && is_string($contents))
    {
        preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);

        if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)
        {
            if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections)
            {
                return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);
            }

            $result = false;
        }
        else
        {
            $result = $contents;
        }
    }

    return $contents;
}

echo getUrlContents('http://wtion');

这是我的python脚本的基础。

line 40, in <module>
writer.writerows(list_of_rows)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 1
7: ordinal not in range(128)

Answer 1

Python 2.x CSV库已损坏。你有三个选择。按复杂程度排列：

编辑：请参阅下文~~使用固定库https://github.com/jdunck/python-unicodecsv（pip install unicodecsv）。用作替代品 - 示例：~~

~~with open("myfile.csv", 'rb') as my_file: r = unicodecsv.DictReader(my_file, encoding='utf-8')~~

<击> <击>
<击>

阅读有关Unicode的CSV手册：https://docs.python.org/2/library/csv.html（参见底部示例）

将每个项目手动编码为UTF-8：

for cell in row.findAll('td'): text = cell.text.replace('[','').replace(']','') list_of_cells.append(text.encode("utf-8"))

编辑，我发现在阅读UTF-16时，python-unicodecsv也被破坏。它抱怨任何0x00字节。

相反，使用https://github.com/ryanhiebert/backports.csv，它更接近Python 3的实现并使用io模块..

安装：

pip install backports.csv

用法：

from backports import csv import io with io.open(filename, encoding='utf-8') as f: r = csv.reader(f):

Answer 2

除了Alastair的优秀建议外，我发现最简单的选择是使用python3而不是python 2.我的脚本中所需要的只是更改wb open语句只需accordance with Python3's syntax中的w语句。

Answer 3

问题出在python 2中的csv库中。来自unicodecsv project page

Python 2的csv模块无法轻松处理unicode字符串，从而导致可怕的“'ascii'编解码器无法在位置编码字符...”异常。

如果可以，只需安装unicodecsv

user.get().then(doc => { //you get user doc value by using data() const userData = doc.data(); // then you can use all properties from userData const verified = userData.verified; });

pip install unicodecsv

Python ASCII编解码器在写入CSV期间无法编码字符错误

3 个答案: