假设我有一个包含多个分类维度和“值”维度的数据框,我希望通过其中一些进行聚合,忽略其他维度。
在Julia DataFrames中有函数聚合,但是如果我放出一些类别值我会得到一个错误,因为它试图将函数(这里,一个和)也应用到它们而不是忽略它们:
在:
using DataArrays, DataFrames
df = DataFrame(
colour = ["green","blue","white","green","green"],
shape = ["circle", "triangle", "square","square","circle"],
border = ["dotted", "line", "line", "line", "dotted"],
area = [1.1, 2.3, 3.1, 4.2, 5.2])
输出:
colour shape border area
1 green circle dotted 1.1
2 blue triangle line 2.3
3 white square line 3.1
4 green square line 4.2
5 green circle dotted 5.2
在:
aggregate(df,[:colour,:shape, :border],sum) # Ok
aggregate(df,[:colour,:shape],sum) # what I would like, ignoring border column
输出:
LoadError: MethodError: no method matching +(::String, ::String)
显然我可能只是在聚合之前删除了额外的列,但也许有一种方法可以在一个段落中完成它?
答案 0 :(得分:3)
来自https://juliastats.github.io/DataFrames.jl/split_apply_combine/
by(df, [:colour,:shape]) do df
DataFrame(m = sum(df[:area]))
end