DataFrames:存储和查询关键字数组

时间:2017-10-31 11:31:56

标签: dataframe julia

我想知道数据帧是否适合于涉及为每条记录存储不同数量的关键字的任务。这是一个最低限度的工作示例:

using DataFrames, Query

df = DataFrame()

df[:Name]  =  ["Alice", "Arthur", "Bob", "Charlie"]
df[:Diet]  =  [["apple", "orange", "onion"], 
               [], 
               ["banana", "onion", "cake"], 
               ["olives", "peanut butter", "avocado"]]
df[:Weight] = [70, 90, 80, 60]

使用Query.jl:

julia> q1 = @from i in df begin
            @where startswith(get(i.Name), "A")
            @select {i.Name, i.Diet, i.Weight}
            @collect DataFrame
       end
2×3 DataFrames.DataFrame
│ Row │ Name     │ Diet                   │ Weight │
├─────┼──────────┼────────────────────────┼────────┤
│ 1   │ "Alice"  │ Any["apple", "orange"] │ 70     │
│ 2   │ "Arthur" │ Any[]                  │ 90     │

但是如何询问涉及关键字的查询。例如,谁吃洋葱?

julia> q2 = @from i in df begin
            # @where ??? a keyword in i.Diet starting with "on"?
            @select {i.Name, i.Diet, i.Weight}
            @collect DataFrame
       end

我知道Query.jl能够使用数据库。

1 个答案:

答案 0 :(得分:2)

@where子句是Julia表达式,因此可以使用any和点符号等函数。具体做法是:

julia> q2 = @from i in df begin
            @where any(startswith.(get(i.Diet), "on"))
            @select {i.Name, i.Diet, i.Weight}
            @collect DataFrame
       end
2×3 DataFrames.DataFrame
│ Row │ Name    │ Diet                            │ Weight │
├─────┼─────────┼─────────────────────────────────┼────────┤
│ 1   │ "Alice" │ Any["apple", "orange", "onion"] │ 70     │
│ 2   │ "Bob"   │ Any["banana", "onion", "cake"]  │ 80     │