熊猫如何不适用于整栏

时间:2017-11-20 11:33:19

标签: python pandas

self.df['Regular Price'] = self.df['Regular Price'].apply(
            lambda x: int(round(x)) if isinstance(
                x, (int, float)) else None
        )

上述代码在数据帧中遇到非数字值时,会为字段Regular Price的每个值分配None。我想将None仅分配给其非数字值的单元格。

感谢

1 个答案:

答案 0 :(得分:1)

首先不可能使用NaN返回integers,因为NaN s的设计属于float

如果mixed类型,您的解决方案正常工作 - 数字与string s:

df = pd.DataFrame({
    'Regular Price': ['a',1,2.3,'a',7],
    'B': list(range(5))
})
print (df)
   B Regular Price
0  0             a
1  1             1
2  2           2.3
3  3             a
4  4             7

df['Regular Price'] = df['Regular Price'].apply(
            lambda x: int(round(x)) if isinstance(
                x, (int, float)) else None
        )

print (df)
   B  Regular Price
0  0            NaN
1  1            1.0
2  2            2.0
3  3            NaN
4  4            7.0

但是,如果所有数据都是字符串,则to_numeric需要errors='coerce'才能将数字转换为数字NaN

df = pd.DataFrame({
    'Regular Price': ['a','1','2.3','a','7'],
    'B': list(range(5))
})
print (df)
   B Regular Price
0  0             a
1  1             1
2  2           2.3
3  3             a
4  4             7

df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
print (df)
   B  Regular Price
0  0            NaN
1  1            1.0
2  2            2.0
3  3            NaN
4  4            7.0

编辑:

  

我还需要删除浮点并仅使用int

可以转换为None NaN并转换为int

df['Regular Price'] = pd.to_numeric(df['Regular Price'],
                                    errors='coerce').round()

df['Regular Price'] = np.where(df['Regular Price'].isnull(), 
                               None,
                               df['Regular Price'].fillna(0).astype(int))

print (df)
   B Regular Price
0  0          None
1  1             1
2  2             2
3  3          None
4  4             7


print (df['Regular Price'].apply(type))
0    <class 'NoneType'>
1         <class 'int'>
2         <class 'int'>
3    <class 'NoneType'>
4         <class 'int'>
Name: Regular Price, dtype: object

但它会降低性能,所以最好的不要使用它。还应该有另一个问题 - soe函数失败,所以如果使用float s,最好是NaN

diff中测试50k rows DataFrame之类的某些功能:

df = pd.DataFrame({
    'Regular Price': ['a','1','2.3','a','7'],
    'B': list(range(5))
})
df = pd.concat([df]*10000).reset_index(drop=True)
print (df)

df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()

df['Regular Price1'] = np.where(df['Regular Price'].isnull(), 
                               None,
                               df['Regular Price'].fillna(0).astype(int))
In [252]: %timeit df['Regular Price2'] = df['Regular Price1'].diff()
  

TypeError:不支持的操作数类型 - :'int'和'NoneType'

In [274]: %timeit df['Regular Price3'] = df['Regular Price'].diff()
1000 loops, best of 3: 301 µs per loop
In [272]: %timeit df['Regular Price2'] = df['Regular Price1'] * 1000
100 loops, best of 3: 4.48 ms per loop

In [273]: %timeit df['Regular Price3'] = df['Regular Price'] * 1000
1000 loops, best of 3: 469 µs per loop

编辑:

df = pd.DataFrame({
    'Regular Price': ['a','1','2.3','a','7'],
    'B': list(range(5))
})
print (df)
   B Regular Price
0  0             a
1  1             1
2  2           2.3
3  3             a
4  4             7

df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
print (df)
   B  Regular Price
0  0            NaN
1  1            1.0
2  2            2.0
3  3            NaN
4  4            7.0

首先可以按列NaN删除Regular Price行,然后转换为int

df1 = df.dropna(subset=['Regular Price']).copy()
df1['Regular Price']  = df1['Regular Price'].astype(int)
print (df1)
   B  Regular Price
1  1              1
2  2              2
4  4              7

处理你需要的东西,但不要改变索引。

#e.g. some process 
df1['Regular Price']  = df1['Regular Price'] * 100

上次combine_first - 它将NaN添加到Regular Price列。

df2 = df1.combine_first(df)
print (df2)
     B  Regular Price
0  0.0            NaN
1  1.0          100.0
2  2.0          200.0
3  3.0            NaN
4  4.0          700.0