我想知道数据帧是否适合于涉及为每条记录存储不同数量的关键字的任务。这是一个最低限度的工作示例:
using DataFrames, Query
df = DataFrame()
df[:Name] = ["Alice", "Arthur", "Bob", "Charlie"]
df[:Diet] = [["apple", "orange", "onion"],
[],
["banana", "onion", "cake"],
["olives", "peanut butter", "avocado"]]
df[:Weight] = [70, 90, 80, 60]
使用Query.jl:
julia> q1 = @from i in df begin
@where startswith(get(i.Name), "A")
@select {i.Name, i.Diet, i.Weight}
@collect DataFrame
end
2×3 DataFrames.DataFrame
│ Row │ Name │ Diet │ Weight │
├─────┼──────────┼────────────────────────┼────────┤
│ 1 │ "Alice" │ Any["apple", "orange"] │ 70 │
│ 2 │ "Arthur" │ Any[] │ 90 │
但是如何询问涉及关键字的查询。例如,谁吃洋葱?
julia> q2 = @from i in df begin
# @where ??? a keyword in i.Diet starting with "on"?
@select {i.Name, i.Diet, i.Weight}
@collect DataFrame
end
我知道Query.jl能够使用数据库。
答案 0 :(得分:2)
@where
子句是Julia表达式,因此可以使用any
和点符号等函数。具体做法是:
julia> q2 = @from i in df begin
@where any(startswith.(get(i.Diet), "on"))
@select {i.Name, i.Diet, i.Weight}
@collect DataFrame
end
2×3 DataFrames.DataFrame
│ Row │ Name │ Diet │ Weight │
├─────┼─────────┼─────────────────────────────────┼────────┤
│ 1 │ "Alice" │ Any["apple", "orange", "onion"] │ 70 │
│ 2 │ "Bob" │ Any["banana", "onion", "cake"] │ 80 │