Question

我正在尝试使用Julia DataFrames模块。我对它很感兴趣所以我可以用它在Gadfly中绘制简单的模拟。我希望能够迭代地向数据帧添加行，我想将其初始化为空。

关于如何执行此操作的教程/文档很少（大多数文档描述了如何分析导入的数据）。

附加到非空数据帧很简单：

df = DataFrame(A = [1, 2], B = [4, 5])
push!(df, [3 6])

返回。

3x2 DataFrame
| Row | A | B |
|-----|---|---|
| 1   | 1 | 4 |
| 2   | 2 | 5 |
| 3   | 3 | 6 |

但是对于一个空的init我会得到错误。

df = DataFrame(A = [], B = [])
push!(df, [3, 6])

错误讯息：

ArgumentError("Error adding 3 to column :A. Possible type mis-match.")
while loading In[220], in expression starting on line 2

初始化空Julia DataFrame的最佳方法是什么，以便您可以稍后在for循环中迭代添加项目？

Answer 1

仅使用 [] 定义的零长度数组将缺少足够的类型信息。

julia> typeof([])
Array{None,1}

所以要避免这个问题只是指明类型。

julia> typeof(Int64[])
Array{Int64,1}

您可以将其应用于您的DataFrame问题

julia> df = DataFrame(A = Int64[], B = Int64[])
0x2 DataFrame

julia> push!(df, [3  6])

julia> df
1x2 DataFrame
| Row | A | B |
|-----|---|---|
| 1   | 3 | 6 |

Answer 2

using Pkg, CSV, DataFrames

iris = CSV.read(joinpath(Pkg.dir("DataFrames"), "test/data/iris.csv"))

new_iris = similar(iris, nrow(iris))

head(new_iris, 2)
# 2×5 DataFrame
# │ Row │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │ Species │
# ├─────┼─────────────┼────────────┼─────────────┼────────────┼─────────┤
# │ 1   │ missing     │ missing    │ missing     │ missing    │ missing │
# │ 2   │ missing     │ missing    │ missing     │ missing    │ missing │

for (i, row) in enumerate(eachrow(iris))
    new_iris[i, :] = row[:]
end

head(new_iris, 2)

# 2×5 DataFrame
# │ Row │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │ Species │
# ├─────┼─────────────┼────────────┼─────────────┼────────────┼─────────┤
# │ 1   │ 5.1         │ 3.5        │ 1.4         │ 0.2        │ setosa  │
# │ 2   │ 4.9         │ 3.0        │ 1.4         │ 0.2        │ setosa  │

Answer 3

@waTeim 的回答已经回答了最初的问题。但是如果我想动态创建一个空的 DataFrame 并向其追加行怎么办。例如。如果我不想要硬编码的列名怎么办？

在这种情况下，df = DataFrame(A = Int64[], B = Int64[]) 是不够的。 NamedTuple A = Int64[], B = Int64[] 需要动态创建。

假设我们有一个列名称向量 col_names 和一个列类型向量 colum_types，从中创建一个空的 DataFrame。

col_names = [:A, :B] # needs to be a vector Symbols
col_types = [Int64, Float64]
# Create a NamedTuple (A=Int64[], ....) by doing
named_tuple = (; zip(col_names, type[] for type in col_types )...)

df = DataFrame(named_tuple) # 0×2 DataFrame

或者，NameTuple 可以创建为

# or by doing
named_tuple = NamedTuple{Tuple(col_names)}(type[] for type in col_types )

julia创建一个空数据框并向其追加行

3 个答案: