带有index_col =参数的pd.read_csv会截断前导零

时间:2018-11-02 18:35:11

标签: pandas

SO对于如何避免使用pd.read_csv截断前导零的问题有几个答案,例如one

我的问题是如何避免使用pd.read_csv方法的index_col =参数截断前导零。在此示例中,ID列的前导零。

>>> import pandas as pd
>>> miss = {'Amount' : [' ', 'NA']}
>>> url = "https://raw.githubusercontent.com/RandyBetancourt/PythonForSASUsers/master/data/messy_input.csv"

>>> d1 = pd.read_csv(url, skiprows=2, na_values=miss, dtype={'ID' : object})
>>> print(d1)
 ID       Date   Amount  Quantity   Status Unnamed: 5
0  0042  16-Oct-17  $23.99      123.0   Closed     Jansen
1  7731  15-Jan-17  $49.99        NaN  Pending        Rho
2  8843   9-Mar-17      129      45.0      NaN      Gupta
3  3013  12-Feb-17      NaN      15.0  Pending   Harrison
4  4431   1-Jul-17  $99.99        1.0   Closed       Yang
>>> print(d1.dtypes)
ID             object
Date           object
Amount         object
Quantity      float64
Status         object
Unnamed: 5     object
dtype: object

在随后的使用index_col =参数的读取中,索引会去除前导零。

>>> miss = {'Amount' : [' ', 'NA']}
>>> url = "https://raw.githubusercontent.com/RandyBetancourt/PythonForSASUsers/master/data/messy_input.csv"
>>> d1 = pd.read_csv(url, skiprows=2, na_values=miss, converters={'ID' : 
str}, index_col='ID')
>>> print(d1)
       Date   Amount  Quantity   Status Unnamed: 5
ID
42    16-Oct-17  $23.99      123.0   Closed     Jansen
7731  15-Jan-17  $49.99        NaN  Pending        Rho
8843   9-Mar-17      129      45.0      NaN      Gupta
3013  12-Feb-17      NaN      15.0  Pending   Harrison
4431   1-Jul-17  $99.99        1.0   Closed       Yang
>>> print(d1.dtypes)
Date           object
Amount         object
Quantity      float64
Status         object
Unnamed: 5     object
dtype: object
>>> d1.index
Int64Index([42, 7731, 8843, 3013, 4431], dtype='int64', name='ID')

如何仅使用pd.read_csv方法保持前导零?我知道我可以在没有index_col =参数的情况下进行读取,并在读取后设置索引并获得所需的结果。

1 个答案:

答案 0 :(得分:1)

您唯一的选择是在解析后设置索引。

d1 = pd.read_csv(url, skiprows=2, na_values=miss, converters={'ID' : 
str}).set_index('ID')

这已经是大熊猫中的open issue了,至今仍未得到解决。