朱莉娅| DataFrame |替换缺失的值

时间:2016-01-05 11:53:01

标签: dataframe julia

对于missing中的列,如何使用0.0替换DataFrame值?

5 个答案:

答案 0 :(得分:3)

using DataFrames
a = @data [1.0,2.0, NA, 4.0] #Make a DataArray with an NA value
df = DataFrame(a=a) #Make a DataFrame from it
df[isna(df[:a]),:a] = 0.0 #Replace NAs in column a with 0.0

结果

4x1 DataFrames.DataFrame
| Row | a   |
|-----|-----|
| 1   | 1.0 |
| 2   | 2.0 |
| 3   | 0.0 |
| 4   | 4.0 |

答案 1 :(得分:2)

使用df s

创建NA
using DataFrames
df = DataFrame(A = 1.0:10.0, B = 2.0:2.0:20.0)
df[ df[:B] %2 .== 0, :A ] = NA

您会在NA中看到一些df ...我们现在将它们转换为0.0

df[ isna(df[:A]), :A] = 0

EDIT = NaNNA。谢谢@Reza

答案 2 :(得分:1)

其他答案都很不错。如果你是一个真正的速度垃圾,也许以下可能适合你:

# prepare example
using DataFrames
df = DataFrame(A = 1.0:10.0, B = 2.0:2.0:20.0)
df[ df[:A] %2 .== 0, :B ] = NA


df[:B].data[df[:B].na] = 0.0 # put the 0.0 into NAs
df[:B] = df[:B].data         # with no NAs might as well use array

答案 3 :(得分:0)

这是自朱莉娅最近引入missing属性以来的简短且更新的答案。

using DataFrames
df = DataFrame(A=rand(1:50, 5), B=rand(1:50, 5), C=vcat(rand(1:50,3), missing, rand(1:50))) ## Creating random 5 integers within the range of 1:50, while introducing a missing variable in one of the rows
df = DataFrame(replace!(convert(Matrix, df), missing=>0)) ## Converting to matrix first, since replacing values directly within type dataframe is not allowed

答案 4 :(得分:0)

从Julia 1.1开始,有几种解决此问题的方法。基本方法如下:

julia> using DataFrames

julia> df = DataFrame(a = [1, missing, missing, 4], b = 5:8)
4×2 DataFrame
│ Row │ a       │ b     │
│     │ Int64⍰  │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 1       │ 5     │
│ 2   │ missing │ 6     │
│ 3   │ missing │ 7     │
│ 4   │ 4       │ 8     │

julia> df.a[ismissing.(df.a)] .= 0
2-element view(::Array{Union{Missing, Int64},1}, [2, 3]) with eltype Union{Missing, Int64}:
 0
 0

julia> df
4×2 DataFrame
│ Row │ a      │ b     │
│     │ Int64⍰ │ Int64 │
├─────┼────────┼───────┤
│ 1   │ 1      │ 5     │
│ 2   │ 0      │ 6     │
│ 3   │ 0      │ 7     │
│ 4   │ 4      │ 8     │

但是,请注意,此时a列的类型仍允许缺少值:

julia> typeof(df.a)
Array{Union{Missing, Int64},1}

打印数据框时,在Int64列中a后面的问号也表明了这一点。您可以使用disallowmissing!来更改此设置:

julia> disallowmissing!(df, :a)
4×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 5     │
│ 2   │ 0     │ 6     │
│ 3   │ 0     │ 7     │
│ 4   │ 4     │ 8     │

另一种方法是使用coalesce

julia> df = DataFrame(a = [1, missing, missing, 4], b = 5:8);

julia> df.a = coalesce.(df.a, 0)
4-element Array{Int64,1}:
 1
 0
 0
 4

julia> df
4×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 5     │
│ 2   │ 0     │ 6     │
│ 3   │ 0     │ 7     │
│ 4   │ 4     │ 8     │

第三个选择是使用Missings.replace软件包中的Missings

julia> using Missings

julia> df = DataFrame(a = [1, missing, missing, 4], b = 5:8);

julia> df.a .= Missings.replace(df.a, 0)
4-element Array{Union{Missing, Int64},1}:
 1
 0
 0
 4

对于其他语法,您可以尝试 DataFramesMeta 软件包:

julia> using DataFramesMeta

julia> df = DataFrame(a = [1, missing, missing, 4], b = 5:8);

julia> @transform(df, a .= coalesce.(:a, 0))
4×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 5     │
│ 2   │ 0     │ 6     │
│ 3   │ 0     │ 7     │
│ 4   │ 4     │ 8     │

有关更多文档,请参见herehere