self.df['Regular Price'] = self.df['Regular Price'].apply(
lambda x: int(round(x)) if isinstance(
x, (int, float)) else None
)
上述代码在数据帧中遇到非数字值时,会为字段Regular Price
的每个值分配None。我想将None
仅分配给其非数字值的单元格。
感谢
答案 0 :(得分:1)
首先不可能使用NaN
返回integers
,因为NaN
s的设计属于float
。
如果mixed
类型,您的解决方案正常工作 - 数字与string
s:
df = pd.DataFrame({
'Regular Price': ['a',1,2.3,'a',7],
'B': list(range(5))
})
print (df)
B Regular Price
0 0 a
1 1 1
2 2 2.3
3 3 a
4 4 7
df['Regular Price'] = df['Regular Price'].apply(
lambda x: int(round(x)) if isinstance(
x, (int, float)) else None
)
print (df)
B Regular Price
0 0 NaN
1 1 1.0
2 2 2.0
3 3 NaN
4 4 7.0
但是,如果所有数据都是字符串,则to_numeric
需要errors='coerce'
才能将数字转换为数字NaN
:
df = pd.DataFrame({
'Regular Price': ['a','1','2.3','a','7'],
'B': list(range(5))
})
print (df)
B Regular Price
0 0 a
1 1 1
2 2 2.3
3 3 a
4 4 7
df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
print (df)
B Regular Price
0 0 NaN
1 1 1.0
2 2 2.0
3 3 NaN
4 4 7.0
编辑:
我还需要删除浮点并仅使用
int
可以转换为None
NaN
并转换为int
:
df['Regular Price'] = pd.to_numeric(df['Regular Price'],
errors='coerce').round()
df['Regular Price'] = np.where(df['Regular Price'].isnull(),
None,
df['Regular Price'].fillna(0).astype(int))
print (df)
B Regular Price
0 0 None
1 1 1
2 2 2
3 3 None
4 4 7
print (df['Regular Price'].apply(type))
0 <class 'NoneType'>
1 <class 'int'>
2 <class 'int'>
3 <class 'NoneType'>
4 <class 'int'>
Name: Regular Price, dtype: object
但它会降低性能,所以最好的不要使用它。还应该有另一个问题 - soe函数失败,所以如果使用float
s,最好是NaN
:
在diff
中测试50k rows DataFrame
之类的某些功能:
df = pd.DataFrame({
'Regular Price': ['a','1','2.3','a','7'],
'B': list(range(5))
})
df = pd.concat([df]*10000).reset_index(drop=True)
print (df)
df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
df['Regular Price1'] = np.where(df['Regular Price'].isnull(),
None,
df['Regular Price'].fillna(0).astype(int))
In [252]: %timeit df['Regular Price2'] = df['Regular Price1'].diff()
TypeError:不支持的操作数类型 - :'int'和'NoneType'
In [274]: %timeit df['Regular Price3'] = df['Regular Price'].diff()
1000 loops, best of 3: 301 µs per loop
In [272]: %timeit df['Regular Price2'] = df['Regular Price1'] * 1000
100 loops, best of 3: 4.48 ms per loop
In [273]: %timeit df['Regular Price3'] = df['Regular Price'] * 1000
1000 loops, best of 3: 469 µs per loop
编辑:
df = pd.DataFrame({
'Regular Price': ['a','1','2.3','a','7'],
'B': list(range(5))
})
print (df)
B Regular Price
0 0 a
1 1 1
2 2 2.3
3 3 a
4 4 7
df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
print (df)
B Regular Price
0 0 NaN
1 1 1.0
2 2 2.0
3 3 NaN
4 4 7.0
首先可以按列NaN
删除Regular Price
行,然后转换为int
。
df1 = df.dropna(subset=['Regular Price']).copy()
df1['Regular Price'] = df1['Regular Price'].astype(int)
print (df1)
B Regular Price
1 1 1
2 2 2
4 4 7
处理你需要的东西,但不要改变索引。
#e.g. some process
df1['Regular Price'] = df1['Regular Price'] * 100
上次combine_first
- 它将NaN
添加到Regular Price
列。
df2 = df1.combine_first(df)
print (df2)
B Regular Price
0 0.0 NaN
1 1.0 100.0
2 2.0 200.0
3 3.0 NaN
4 4.0 700.0