文字内容/解码问题

时间:2018-02-03 03:21:50

标签: python character-encoding python-3.6

我希望有人可以在这里解除我的无知:我目前正在使用python 3.6.4,并且我试图将字符串转换为简单的字母数字。

我得到 的方式主要是排序,直到我找到带有变音符号的字符。它涉及足球队名称,因此我希望将1. FC Köln转换为1fckoln。所以:

import requests

c = requests.get(the_url)
content = c.text

#code here to extract team name into variable 'ht'

ht = simpname(ht)

def simpname(who):
    punct = "' .-/\°()"
    the_o = 'òóôõöÖøØ'

    for p in punct:
        if p in who:
            who = who.replace(p, '')

    if the_o in who:
        who = who.replace(the_o, 'o')

    who = who.lower()

    return who

(注意:代码缩减了例子,我以同样的方式处理a,e等)

这里唯一的问题是,在我的示例中,文本以1. FC Köln到达。我知道我有一个字符编码问题,但我似乎无法将其置于正确的状态。有人可以就我的问题提出建议吗?

解决!感谢@Idlehands和以下评论者的建议。下面是相同的代码,更新适用于未来的读者可以看到差异。

import requests

incoming = requests.get(the_url)
cinput = incoming.content
cinput = cinput.decode('iso-8859-1')
cinput = str(cinput)

# more code, eventually extracts a team name under 'ht'

ht = simpname(ht)

...

def simpname(who):
    punct = "' .-/\°()"
    the_o = 'òóôõöÖøØ'

    # who is currently 1. FC Köln

    who = who.encode('latin-1') # who becomes b'1. FC K\xc3\xb6ln'
    who = who.decode('utf-8')   # who becomes '1. FC Köln'

    for p in punct:
        if p in who:
            who = who.replace(p, '')

    for an_o in the_o:
        if an_o in who:
            who = who.replace(an_o, 'o')

    who = who.lower()

0 个答案:

没有答案