Question

假设我的DataFrame列有Float64列，我想通过对该列进行分箱来对数据帧进行分组。我听说cut函数可能会有所帮助，但它没有在数据框架上定义。已完成一些工作（https://gist.github.com/tautologico/3925372），但我宁愿使用库函数而不是从Internet复制粘贴代码。指针？

编辑通过UNIX时间戳找到按月执行此操作的方法的业力：）

Answer 1

您可以根据Float64这样的列来合并数据帧。在这里，我的箱子的增量为0.1，从0.0到1.0，根据100个随机数在0.0到1.0之间的列对数据帧进行分箱。

using DataFrames #load DataFrames
df = DataFrame(index = rand(Float64,100)) #Make a DataFrame with some random Float64 numbers
df_array = map(x->df[(df[:index] .>= x[1]) .& (df[:index] .<x[2]),:],zip(0.0:0.1:0.9,0.1:0.1:1.0)) #Map an anonymous function that gets every row between two numbers specified by a tuple called x, and map that anonymous function to an array of tuples generated using the zip function.

这将产生一个包含10个数据帧的数组，每个数据帧具有不同的0.1大小的数据库。

关于UNIX时间戳问题，我对这方面的事情并不熟悉，但是在玩了一下后可能会有这样的事情：

using Dates

df = DataFrame(unixtime = rand(1E9:1:1.1E9,100)) #Make a dataframe with floats containing pretend unix time stamps
df[:date] = Dates.unix2datetime.(df[:unixtime]) #convert those timestamps to DateTime types
df[:year_month] = map(date->string(Dates.Year.(date))*" "*string(Dates.Month.(date)),df[:date]) #Make a string for every month in your time range
df_array = map(ym->df[df[:year_month] .== ym,:],unique(df[:year_month])) #Bin based on each unique year_month string

通过在Julia中对列:: Float64进行分箱来对DataFrame进行分组

1 个答案: