Question

我正在尝试将CSV文件加载到pandas数据帧中。 CSV以分号分隔。文本列中的值使用双引号。

有问题的文件：https://www.dropbox.com/s/1xv391gebjzmmco/file_01.csv?dl=0

在其中一个文本列（＆＃39; TYTUL＆＃39; ）中，我有以下值：

＆＃34; 00 307 1457 212＆＃34;

我将列指定为 str 但是当我打印或导出结果到excel时我得到了

003071457212

而不是

00 307 1457 212

如何防止pandas删除空格？

这是我的代码：

import pandas

df = pandas.read_csv(r'file_01.csv'
                     ,sep = ';'
                     ,quotechar = '"'
                     ,names = ['DATA_OPERACJI'
                               ,'DATA_KSIEGOWANIA'
                               ,'OPIS_OPERACJI'
                               ,'TYTUL'
                               ,'NADAWCA_ODBIORCA'
                               ,'NUMER_KONTA'
                               ,'KWOTA'
                               ,'SALDO_PO_OPERACJI'
                               ,'KOLUMNA_9']
                     ,usecols = [0,1,2,3,4,5,6,7]
                     ,skiprows = 38
                     ,skipfooter = 3
                     ,encoding = 'cp1250'
                     ,thousands = ' '
                     ,decimal = ','
                     ,parse_dates = [0,1]
                     ,converters = {'OPIS_OPERACJI': str
                                    ,'TYTUL': str
                                    ,'NADAWCA_ODBIORCA': str
                                    ,'NUMER_KONTA': str}
                     ,engine = 'python'
                     )

df.TYTUL.replace([' +', '^ +', ' +$'], [' ', '', ''],regex=True,inplace=True) #this only removes excessive spaces

print(df.TYTUL)

我也想出了一个解决方法（评论#workaround），但我想问一下是否有更好的方法。

import pandas

df = pandas.read_csv(r'file_01.csv'
                     ,sep = ';'
                     ,quotechar = '?' #workaround
                     ,names = ['DATA_OPERACJI'
                               ,'DATA_KSIEGOWANIA'
                               ,'OPIS_OPERACJI'
                               ,'TYTUL'
                               ,'NADAWCA_ODBIORCA'
                               ,'NUMER_KONTA'
                               ,'KWOTA'
                               ,'SALDO_PO_OPERACJI'
                               ,'KOLUMNA_9']
                     ,usecols = [0,1,2,3,4,5,6,7]
                     ,skiprows = 38
                     ,skipfooter = 3
                     ,encoding = 'cp1250'
                     ,thousands = ' '
                     ,decimal = ','
                     ,parse_dates = [0,1]
                     ,converters = {'OPIS_OPERACJI': str
                                    ,'TYTUL': str
                                    ,'NADAWCA_ODBIORCA': str
                                    ,'NUMER_KONTA': str}
                     ,engine = 'python'
                     )

df.TYTUL.replace([' +', '^ +', ' +$'], [' ', '', ''],regex=True,inplace=True) #this only removes excessive spaces

df.TYTUL.replace(['^"', '"$'], ['', ''],regex=True,inplace=True) #workaround

print(df.TYTUL)

Answer 1

从read_csv代码中删除此行

,thousands = ' '

我测试了它，没有这个选项，输出是正确的

'00 307 1457 212'

防止pandas删除文本列中数字中的空格

1 个答案: