二元逻辑回归

时间:2018-04-15 01:03:53

标签: sql sql-server logistic-regression balance

我需要使用SQL Server平衡数据集以进行二元逻辑回归项目。不平衡的数据大约是10:90%。我如何建议平衡sql server中的数据?

1 个答案:

答案 0 :(得分:0)

这是一种方法:

select t.*
from (select t.*,
             row_number() over (partition by target order by newid()) as seqnum,
             sum(case when target = 0 then 1 else 0 end) over () as num_0,
             sum(case when target = 1 then 1 else 0 end) over () as num_1
      from t
     ) t
where (num_0 <= num_1 and seqnum <= num_0) or
      (num_1 < num_0 and seqnum <= num_1);

这会使目标的每个值随机化行。它为较稀有的目标提取所有行,为更常见的目标提取相同大小的随机样本。