尝试将对象列转换为 int

时间:2021-02-07 08:01:31

标签: python pandas

我有一个数据框,其中很少有列是对象,我想将其中一个更改为 int 列,以便我可以使用它。并做一些计算。但是当我尝试这样做时会收到此错误。

这是我的代码。

给我错误的代码。

df['Amount in USD']=df['Amount in USD'].str.replace(',', '') #this worked fine

df['Amount in USD']=df['Amount in USD'].astype(int) #but this doesn't

错误

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-b9d8d4e75b08> in <module>
----> 1 df['Amount in USD']=df['Amount in USD'].astype(int)

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5870         else:
   5871             # else, only a single dtype is given
-> 5872             new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   5873             return self._constructor(new_data).__finalize__(self, method="astype")
   5874 

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
    629         self, dtype, copy: bool = False, errors: str = "raise"
    630     ) -> "BlockManager":
--> 631         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    632 
    633     def convert(

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
    425                     applied = b.apply(f, **kwargs)
    426                 else:
--> 427                     applied = getattr(b, f)(**kwargs)
    428             except (TypeError, NotImplementedError):
    429                 if not ignore_failures:

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
    671             vals1d = values.ravel()
    672             try:
--> 673                 values = astype_nansafe(vals1d, dtype, copy=True)
    674             except (ValueError, TypeError):
    675                 # e.g. astype_nansafe can fail on object-dtype of strings

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
   1072         # work around NumPy brokenness, #1987
   1073         if np.issubdtype(dtype.type, np.integer):
-> 1074             return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
   1075 
   1076         # if we have a datetime/timedelta array of objects

pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()

ValueError: invalid literal for int() with base 10: 'undisclosed'

关于数据框的信息。

0   Sr No              3044 non-null   int64 
 1   Date dd/mm/yyyy    3044 non-null   object
 2   Startup Name       3044 non-null   object
 3   Industry Vertical  2873 non-null   object
 4   SubVertical        2108 non-null   object
 5   City  Location     2864 non-null   object
 6   Investors Name     3020 non-null   object
 7   InvestmentnType    3040 non-null   object
 8   Amount in USD      2084 non-null   object
 9   Remarks            419 non-null    object

这里是我的数据框示例

Sr No   Date dd/mm/yyyy Startup Name    Industry Vertical   SubVertical City Location   Investors Name  InvestmentnType Amount in USD   Remarks
0   1   09/01/2020  BYJU’S  E-Tech  E-learning  Bengaluru   Tiger Global Management Private Equity Round    20,00,00,000    NaN
1   2   13/01/2020  Shuttl  Transportation  App based shuttle service   Gurgaon Susquehanna Growth Equity   Series C    80,48,394   NaN
2   3   09/01/2020  Mamaearth   E-commerce  Retailer of baby and toddler products   Bengaluru   Sequoia Capital India   Series B    1,83,58,860 NaN
3   4   02/01/2020  https://www.wealthbucket.in/    FinTech Online Investment   New Delhi   Vinod Khatumal  Pre-series A    30,00,000   NaN

1 个答案:

答案 0 :(得分:1)

您的 'undisclosed' 中有一个分类变量实例 df['Amount in USD'],它本身无法转换为 int

您需要自己映射非数字值与字符串类型,即:

df['Amount in USD'] = df['Amount in USD'].replace('undisclosed', '-1')
df['Amount in USD'] = df['Amount in USD'].astype(int)

我在这里假设,您的 '-1' 列中没有 df['Amount in USD'] 值。您可以像这样检查该列的唯一值:

`df['Amount in USD']`.unique()

请随时将这些内容添加到您的问题中,以便我进一步为您提供帮助。


编辑奖励:

根据您要对该列执行的计算,您需要仔细选择整数。网上有几个很好的指南:

确保它也适合您在我看来确实像金融的域。