如何从熊猫列中删除非UTF-8字符

时间:2019-06-24 22:35:49

标签: python pandas

这是该问题的后续解答

Removing Non ASCII characters and replacing with spaces from Pandas data frame

其中讲述了如何从熊猫列中删除非ASCII字符

 df['DB_user'] = df["DB_user"].apply(lambda x: ''.join([" " if ord(i) < 32 or ord(i) > 126 else i for i in x]))

从UTF-8维基百科来看,UTF-8是

  

Unicode的前128个字符

https://en.wikipedia.org/wiki/UTF-8

所以我的猜测是解决方案

 df['DB_user'] = df["DB_user"].apply(lambda x: ''.join([" " if ord(i) > 127 else i for i in x]))

0 个答案:

没有答案