获取响应dropna的julia DataFrame中的计算NA列值

时间:2015-09-18 05:02:31

标签: dataframe julia

我正在尝试使用NA作为结果来表明 给定DataFrame“行”的计算值是没有意义的 (或者可能无法计算)。如何获得仍然响应NA的计算dropna的列?

示例:

using DataFrames

df = DataFrame(A = 1:4, B = [1, 0, 2, 3], C = [5, 4, 3, 3])

# A value of 0 in column B should yield a foo of NA
function foo(d)
  if d[:B] == 0
    return NA
  end
  return d[:B] ./ d[:C] # vectorized to work with `by`
end

# What I'm looking for is something equivalent to this list
# comprehension, but that returns a DataFrame or DataArray
# since normal Arrays don't respond to `dropna`

comprehension = [foo(frame) for frame in eachrow(df)]  

2 个答案:

答案 0 :(得分:2)

一种选择是扩展public class Program { public static void Main(string[] args) { try { throw new ConnectionLostException(); } catch (Exception ex) { if (ex is LoginInfoException) { Console.WriteLine ("LoginInfoException"); } else if (ex is ConnectionLostException) { Console.WriteLine ("ConnectionLostException"); } } } } public class LoginInfoException : WebException { public String Message { get; set; } } public class ConnectionLostException : WebException { public String Message { get; set; } } Base.convert,以便DataArrays.dropna可以处理正常的dropna

Vector

现在示例应该按预期工作:

using DataFrames

function Base.convert{T}(::Type{DataArray}, v::Vector{T})
  da = DataArray(T[],Bool[])
  for val in v
    push!(da, val)
  end
  return da
end

function DataArrays.dropna(v::Vector)
  return dropna(convert(DataArray,v))
end

即使没有扩展的df = DataFrame(A = 1:4, B = [1, 0, 2, 3], C = [5, 4, 3, 3]) # A value of 0 in column B should yield a foo of NA function foo(d) if d[:B] == 0 return NA end return d[:B] / d[:C] end comprehension = [foo(frame) for frame in eachrow(df)] dropna(comprehension) #=> Array{Any,1}: [0.2, 0.667, 1.] ,扩展的dropna也允许将理解作为新的DataArray列插入到DataFrame中,保留convert及其适当的删除行为:

NA

答案 1 :(得分:1)

这有点棘手,因为数据帧行是不方便的对象。例如,我认为这是完全合理的:

using DataFrames
df = DataFrame(A = 1:4, B = [1, 0, 2, 3], C = [5, 4, 3, 3])

# A value of 0 in column B should yield a foo of NA
function foo(d)
    if d[:B] == 0
    return NA
  end
  return d[:B] / d[:C] # vectorized to work with `by`
end
comp = DataArray(Float64,4)
map!(r->foo(r), eachrow(df))

但这会导致

`map!` has no method matching map!(::Function, ::DFRowIterator{DataFrame})

但是,如果你只是想做一个并不总是返回一行的by那么你可以这样做:

using DataFrames
df = DataFrame(A = 1:4, B = [1, 0, 2, 3], C = [5, 4, 3, 3])

# A value of 0 in column B returns an empty array
function foo(d)
    if d[1,:B] == 0
        return []
  end
    return d[1,:B] / d[1,:C] #Plan on only getting a single row in the by
end

by(df, [:A,:B,:C]) do d
    foo(d)
end

导致

3x4 DataFrame
| Row | A | B | C | x1       |
|-----|---|---|---|----------|
| 1   | 1 | 1 | 5 | 0.2      |
| 2   | 3 | 2 | 3 | 0.666667 |
| 3   | 4 | 3 | 3 | 1.0      |