应用错误收集

我是机器学习的新手。刚刚开始POC的新项目。

我的模型数据集为以下格式的csv文件。

我的目标是在分别输入/输出尖峰时扣除任何src /目标的异常。

timestamp src destination in  out  type
 00       p1    p2        2    3    abc
 00       p2    p3        1    4    abc
 00       p3    pn        3    5    abc
 05       p1    pn        4    3    abc
 05       p2    p1        2    2    abc
 10       p1    p3       91    6    abc <- src anomaly deduction
 10       px    py       30   92    abc <- dest anomay deduction

这里src映射到。目的地映射到out value。

我想使用pyspark为Spark中的每个src / destinaton准备一个Kmean模型。我不知道该怎么办？如何准备模型。你能给我一些指针吗？

感谢。

机器学习kmeans模型在Spark中

0 个答案: