我有一个数据框,其中很少有列是对象,我想将其中一个更改为 int 列,以便我可以使用它。并做一些计算。但是当我尝试这样做时会收到此错误。
这是我的代码。
给我错误的代码。
df['Amount in USD']=df['Amount in USD'].str.replace(',', '') #this worked fine
df['Amount in USD']=df['Amount in USD'].astype(int) #but this doesn't
错误
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-21-b9d8d4e75b08> in <module>
----> 1 df['Amount in USD']=df['Amount in USD'].astype(int)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
5870 else:
5871 # else, only a single dtype is given
-> 5872 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
5873 return self._constructor(new_data).__finalize__(self, method="astype")
5874
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
629 self, dtype, copy: bool = False, errors: str = "raise"
630 ) -> "BlockManager":
--> 631 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
632
633 def convert(
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
425 applied = b.apply(f, **kwargs)
426 else:
--> 427 applied = getattr(b, f)(**kwargs)
428 except (TypeError, NotImplementedError):
429 if not ignore_failures:
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
671 vals1d = values.ravel()
672 try:
--> 673 values = astype_nansafe(vals1d, dtype, copy=True)
674 except (ValueError, TypeError):
675 # e.g. astype_nansafe can fail on object-dtype of strings
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
1072 # work around NumPy brokenness, #1987
1073 if np.issubdtype(dtype.type, np.integer):
-> 1074 return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
1075
1076 # if we have a datetime/timedelta array of objects
pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()
ValueError: invalid literal for int() with base 10: 'undisclosed'
关于数据框的信息。
0 Sr No 3044 non-null int64
1 Date dd/mm/yyyy 3044 non-null object
2 Startup Name 3044 non-null object
3 Industry Vertical 2873 non-null object
4 SubVertical 2108 non-null object
5 City Location 2864 non-null object
6 Investors Name 3020 non-null object
7 InvestmentnType 3040 non-null object
8 Amount in USD 2084 non-null object
9 Remarks 419 non-null object
这里是我的数据框示例
Sr No Date dd/mm/yyyy Startup Name Industry Vertical SubVertical City Location Investors Name InvestmentnType Amount in USD Remarks
0 1 09/01/2020 BYJU’S E-Tech E-learning Bengaluru Tiger Global Management Private Equity Round 20,00,00,000 NaN
1 2 13/01/2020 Shuttl Transportation App based shuttle service Gurgaon Susquehanna Growth Equity Series C 80,48,394 NaN
2 3 09/01/2020 Mamaearth E-commerce Retailer of baby and toddler products Bengaluru Sequoia Capital India Series B 1,83,58,860 NaN
3 4 02/01/2020 https://www.wealthbucket.in/ FinTech Online Investment New Delhi Vinod Khatumal Pre-series A 30,00,000 NaN
答案 0 :(得分:1)
您的 'undisclosed'
中有一个分类变量实例 df['Amount in USD']
,它本身无法转换为 int
。
您需要自己映射非数字值与字符串类型,即:
df['Amount in USD'] = df['Amount in USD'].replace('undisclosed', '-1')
df['Amount in USD'] = df['Amount in USD'].astype(int)
我在这里假设,您的 '-1'
列中没有 df['Amount in USD']
值。您可以像这样检查该列的唯一值:
`df['Amount in USD']`.unique()
请随时将这些内容添加到您的问题中,以便我进一步为您提供帮助。
编辑奖励:
根据您要对该列执行的计算,您需要仔细选择整数。网上有几个很好的指南:
确保它也适合您在我看来确实像金融的域。