如何让函数聚合“忽略”列?

时间:2017-02-14 09:24:53

标签: dataframe julia

假设我有一个包含多个分类维度和“值”维度的数据框,我希望通过其中一些进行聚合,忽略其他维度。

在Julia DataFrames中有函数聚合,但是如果我放出一些类别值我会得到一个错误,因为它试图将函数(这里,一个和)也应用到它们而不是忽略它们:

在:

using DataArrays, DataFrames
df = DataFrame(
  colour = ["green","blue","white","green","green"],
  shape  = ["circle", "triangle", "square","square","circle"],
  border = ["dotted", "line", "line", "line", "dotted"],
  area   = [1.1, 2.3, 3.1, 4.2, 5.2])

输出:

    colour  shape       border  area
1   green   circle      dotted  1.1
2   blue    triangle    line    2.3
3   white   square      line    3.1
4   green   square      line    4.2
5   green   circle      dotted  5.2

在:

aggregate(df,[:colour,:shape, :border],sum) # Ok
aggregate(df,[:colour,:shape],sum) # what I would like, ignoring border column

输出:

LoadError: MethodError: no method matching +(::String, ::String)

显然我可能只是在聚合之前删除了额外的列,但也许有一种方法可以在一个段落中完成它?

1 个答案:

答案 0 :(得分:3)

来自https://juliastats.github.io/DataFrames.jl/split_apply_combine/

by(df, [:colour,:shape]) do df
    DataFrame(m = sum(df[:area]))
end