我希望有人可以在这里解除我的无知:我目前正在使用python 3.6.4,并且我试图将字符串转换为简单的字母数字。
我得到 的方式主要是排序,直到我找到带有变音符号的字符。它涉及足球队名称,因此我希望将1. FC Köln
转换为1fckoln
。所以:
import requests
c = requests.get(the_url)
content = c.text
#code here to extract team name into variable 'ht'
ht = simpname(ht)
def simpname(who):
punct = "' .-/\°()"
the_o = 'òóôõöÖøØ'
for p in punct:
if p in who:
who = who.replace(p, '')
if the_o in who:
who = who.replace(the_o, 'o')
who = who.lower()
return who
(注意:代码缩减了例子,我以同样的方式处理a,e等)
这里唯一的问题是,在我的示例中,文本以1. FC Köln
到达。我知道我有一个字符编码问题,但我似乎无法将其置于正确的状态。有人可以就我的问题提出建议吗?
解决!感谢@Idlehands和以下评论者的建议。下面是相同的代码,更新适用于未来的读者可以看到差异。
import requests
incoming = requests.get(the_url)
cinput = incoming.content
cinput = cinput.decode('iso-8859-1')
cinput = str(cinput)
# more code, eventually extracts a team name under 'ht'
ht = simpname(ht)
...
def simpname(who):
punct = "' .-/\°()"
the_o = 'òóôõöÖøØ'
# who is currently 1. FC Köln
who = who.encode('latin-1') # who becomes b'1. FC K\xc3\xb6ln'
who = who.decode('utf-8') # who becomes '1. FC Köln'
for p in punct:
if p in who:
who = who.replace(p, '')
for an_o in the_o:
if an_o in who:
who = who.replace(an_o, 'o')
who = who.lower()