Question

这是一个非常基本的例子。但我正在进行一些数据分析，并不断发现自己编写非常相似的SQL计数查询，以生成概率表。

我的表被定义为值为0表示事件未发生，而值为1表示事件确实发生。

  > sqldf("select count(distinct Date) from joinedData where C_O_Above_prevHigh = 0 and  C_O_Below_prevLow = 0")
  count(distinct Date)
1                 1081

> sqldf("select count(distinct Date) from joinedData where C_O_Above_prevHigh = 0 and C_O_Below_prevLow = 0 and E_halfGap = 1")
  count(distinct Date)
1                  956

> sqldf("select count(distinct Date) from joinedData where C_O_Above_prevHigh = 1 OR C_O_Below_prevLow = 1 and E_halfGap = 1")
  count(distinct Date)
1                  504

在上面的例子中，我的预测变量是C_O_Above_prevHigh和C_O_Below_prevLow我的结果变量是E_halfGap。在某些情况下，可能存在更多预测变量，例如Time

与其执行上述操作并手动输入具有不同permations的所有查询，R或其他应用程序中是否有任何可用的内容：

1）根据我的预测因子输出潜在的概率路径？ 2）允许我选择如何分割路径

感谢您的意见。

Answer 1

如果你想要所有总数和小计，你可以在SQL中使用CUBE BY（但它不在SQLite中）或在{。

中addmargins

addmargins( Titanic )
# More readable:
ftable( addmargins( Titanic ) )

如果要构建决策树，您可以使用rpart包或检查 machine learning 要么 graphical models 任务视图

分类/决策树和选择分裂

1 个答案: