计算Spark scala中的置信区间

时间:2017-07-06 08:53:23

标签: scala apache-spark confidence-interval

我有以下数据框:

+---------------+-----------+-------------+-----+----+----+--------------------+-------------------+------+--------------------+-------+-------+--------------------+
|   time_stamp_0|sender_ip_1|receiver_ip_2|count|rank|  xi|                  pi|                  r|attack|             myvalue|max_int|min_int|                 int|
+---------------+-----------+-------------+-----+----+----+--------------------+-------------------+------+--------------------+-------+-------+--------------------+
|12:18:52.702936|   10.0.0.1|     10.0.0.4|11139|   1|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:18:53.702976|   10.0.0.1|     10.0.0.4|11139|   2|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:18:54.702873|   10.0.0.1|     10.0.0.4|11139|   3|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:18:55.702825|   10.0.0.1|     10.0.0.4|11139|   4|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:18:56.703021|   10.0.0.1|     10.0.0.4|11139|   5|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:18:57.703786|   10.0.0.1|     10.0.0.4|11139|   6|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:18:58.706354|   10.0.0.1|     10.0.0.4|11139|   7|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:18:59.705885|   10.0.0.1|     10.0.0.4|11139|   8|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:20:14.703371|   10.0.0.1|     10.0.0.4|11139|   9|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:20:15.702891|   10.0.0.1|     10.0.0.4|11139|  10|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:20:16.703450|   10.0.0.1|     10.0.0.4|11139|  11|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:20:17.703087|   10.0.0.1|     10.0.0.4|11139|  12|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:20:18.704467|   10.0.0.1|     10.0.0.4|11139|  13|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:20:19.703472|   10.0.0.1|     10.0.0.4|11139|  14|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:20:20.703268|   10.0.0.1|     10.0.0.4|11139|  15|  15| 0.00134661998384056|0.49609480204686235|     0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...|
|12:18:52.995718|   10.0.0.5|     10.0.0.1|11139|   1|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:18:53.995478|   10.0.0.5|     10.0.0.1|11139|   2|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:18:54.995653|   10.0.0.5|     10.0.0.1|11139|   3|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:18:55.995978|   10.0.0.5|     10.0.0.1|11139|   4|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:18:56.994984|   10.0.0.5|     10.0.0.1|11139|   5|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:18:57.995190|   10.0.0.5|     10.0.0.1|11139|   6|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:18:58.994970|   10.0.0.5|     10.0.0.1|11139|   7|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:20:14.995142|   10.0.0.5|     10.0.0.1|11139|   8|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:20:15.995244|   10.0.0.5|     10.0.0.1|11139|   9|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:20:16.995481|   10.0.0.5|     10.0.0.1|11139|  10|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:20:17.995213|   10.0.0.5|     10.0.0.1|11139|  11|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:20:18.994985|   10.0.0.5|     10.0.0.1|11139|  12|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:20:19.994872|   10.0.0.5|     10.0.0.1|11139|  13|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:20:20.994932|   10.0.0.5|     10.0.0.1|11139|  14|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:18:52.995744|   10.0.0.1|     10.0.0.5|11139|   1|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:18:53.995496|   10.0.0.1|     10.0.0.5|11139|   2|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:18:54.995665|   10.0.0.1|     10.0.0.5|11139|   3|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:18:55.995986|   10.0.0.1|     10.0.0.5|11139|   4|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:18:56.994999|   10.0.0.1|     10.0.0.5|11139|   5|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:18:57.995204|   10.0.0.1|     10.0.0.5|11139|   6|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:18:58.995057|   10.0.0.1|     10.0.0.5|11139|   7|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:20:14.995169|   10.0.0.1|     10.0.0.5|11139|   8|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:20:15.995261|   10.0.0.1|     10.0.0.5|11139|   9|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:20:16.995499|   10.0.0.1|     10.0.0.5|11139|  10|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:20:17.995220|   10.0.0.1|     10.0.0.5|11139|  11|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:20:18.994997|   10.0.0.1|     10.0.0.5|11139|  12|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:20:19.994891|   10.0.0.1|     10.0.0.5|11139|  13|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:20:20.994951|   10.0.0.1|     10.0.0.5|11139|  14|  14|0.001256845318251...|0.49609480204686235|     0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...|
|12:18:52.811535|   10.0.0.1|     10.0.0.2|11139|   1|5526| 0.49609480204686235|0.49609480204686235|     0|   0.347756620851195|11139.0|11139.0|[11139.000, 11139...|
|12:18:53.812029|   10.0.0.1|     10.0.0.2|11139|   2|5526| 0.49609480204686235|0.49609480204686235|     0|   0.347756620851195|11139.0|11139.0|[11139.000, 11139...|
|12:18:54.480070|   10.0.0.1|     10.0.0.2|11139|   3|5526| 0.49609480204686235|0.49609480204686235|     0|   0.347756620851195|11139.0|11139.0|[11139.000, 11139...|
|12:18:54.481196|   10.0.0.1|     10.0.0.2|11139|   4|5526| 0.49609480204686235|0.49609480204686235|     0|   0.347756620851195|11139.0|11139.0|[11139.000, 11139...|
|12:18:54.483532|   10.0.0.1|     10.0.0.2|11139|   5|5526| 0.49609480204686235|0.49609480204686235|     0|   0.347756620851195|11139.0|11139.0|[11139.000, 11139...|
|12:18:54.485713|   10.0.0.1|     10.0.0.2|11139|   6|5526| 0.49609480204686235|0.49609480204686235|     0|   0.347756620851195|11139.0|11139.0|[11139.000, 11139...|
|12:18:54.487091|   10.0.0.1|     10.0.0.2|11139|   7|5526| 0.49609480204686235|0.49609480204686235|     0|   0.347756620851195|11139.0|11139.0|[11139.000, 11139...|
|12:18:54.488272|   10.0.0.1|     10.0.0.2|11139|   8|5526| 0.49609480204686235|0.49609480204686235|     0|   0.347756620851195|11139.0|11139.0|[11139.000, 11139...|

我需要为“myvalue”列计算置信区间,最小置信区间和最大置信区间(关于置信区间计算:http://www.statisticshowto.com/how-to-find-a-confidence-interval/)。我使用了以下代码:

 val cntInterval = final_add_count_rank_xi_pi_r_attack_antropy.select("myvalue").rdd.countApprox(timeout = 1000L,confidence = 0.95)
    val (lowCnt,highCnt) = (cntInterval.getFinalValue().low, cntInterval.getFinalValue().high)

    //Add the confidencial interval to df
    val final_integration_df=final_add_count_rank_xi_pi_r_attack_antropy.withColumn("max_int", lit(highCnt))
    .withColumn("min_int", lit(lowCnt))
    .withColumn("int", lit(cntInterval.getFinalValue().toString()))

    //Data becomes clean
    final_integration_df.show(100)

但问题是,我的数据帧中所有三个值(置信区间,最小置信区间和最大置信区间)的置信区间为11139.0,等于“10.0.0.1”和“10.0.0.2”之间的连接数“! (数据框中的count列) 你能帮我解决一下吗?感谢

1 个答案:

答案 0 :(得分:1)

据我了解,您希望计算DataFrame中每一行的置信度。为此,请使用UDF,而不是点亮。 Lit函数将相同的数据插入每一行。

以下是UDF的示例:

ManualResetEventSlim