我无法将葡萄牙语中的特殊字符(例如 é)替换为列中的 UTF-8 代码,例如:
https://www.linkedin.com/in/andré-pieroni-mesa.. 之前。 https://www.linkedin.com/in/andr%c3%a9-pieroni-mesa.. 之后。
import numpy as np
import pandas as pd
import json
dfgetprospect=pd.read_excel(r'C:\Users\PICHAU\Desktop\Cargo Sapiens\Inteligencia Comercial\Upload empresas\Get Prospect.xlsx')
df= pd.read_csv(r'C:\Users\PICHAU\Desktop\Curso Python\Encoding links.csv', delimiter=';')
df=df[['Character','UTF-8']]
df.set_index(keys=['Character'], inplace=True)
lista = df.to_dict()
lista=lista['UTF-8']
lista
#lista = json.dumps(lista)
#lista=str(lista).replace("{","").replace("}","")
dfgetprospect['Linkedin Url'].str.replace({lista})
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-41-fd99a3f16c6e> in <module>
----> 1 dfgetprospect['Linkedin Url'].str.replace({lista})
TypeError: unhashable type: 'dict'
列表中的示例: {'空格':'%20', “!”:“%21”, '"': '%22', “#”:“%23”, “$”:“%24”, “%”:“%25”, '&': '%26', "'": '%27', '(': '%28', ')': '%29', '*': '%2a', 来自 df 的示例: û %c3%bb ü %c3%bc ý %c3%bd þ %c3%be ÿ %c3%bf
答案 0 :(得分:0)
您可以使用 urllib.parse.quote
模块中的 urllib 函数
df = pd.DataFrame({'urls': ['http://example.org/andré-pieroni-mesa',
'http://example.org/!"#$%&\'()*ûüýþÿ']})
import urllib
df['urls_quoted'] = df['urls'].apply(urllib.parse.quote)
输入:
urls
0 http://example.org/andré-pieroni-mesa
1 http://example.org/!"#$%&'()*ûüýþÿ
输出:
urls urls_quoted
0 http://example.org/andré-pieroni-mesa http%3A//example.org/andr%C3%A9-pieroni-mesa
1 http://example.org/!"#$%&'()*ûüýþÿ http%3A//example.org/%21%22%23%24%25%26%27%28%29%2A%C3%BB%C3%BC%C3%BD%C3%BE%C3%BF