Question

我有一个csv文件，如下所示，只有一列（cust_code）带有引号，并且每一行也带有引号

“CUST_CODE”
“CST001001”
“CST000235”
“CST010231”
“CST010235”
“CST010231”
“CST010235”
“CST010231”
“CST040015”

我试图用熊猫读取该文件，但我收到了错误消息

'utf-8'编解码器无法解码位置0的字节0x93：无效的起始字节

另外，我尝试通过将编码类型传递为ascii和utf-8 但没有任何作用

Answer 1

尝试通过encoding='cp1252'。确保将'Documents\Book1.csv'换成您在以下文件中的任何文件路径：

df = pd.read_csv('Documents\Book1.csv', encoding='cp1252')
df

    “CUST_CODE”
0   “CST001001”
1   “CST000235”
2   “CST010231”
3   “CST010235”
4   “CST010231”
5   “CST010235”
6   “CST010231”
7   “CST040015”

这是维基百科，其中提供有关该编码类型的更多信息：https://en.wikipedia.org/wiki/Windows-1252。引自维基百科的文章：

"...common result was that all the quotes and apostrophes (produced by "smart quotes" in word-processing software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read."

使用熊猫读取列名称具有标点符号的csv

1 个答案: