如何基于ip进行分组并聚合Scala中的特定字段?

时间:2018-02-09 06:41:02

标签: scala

我正在尝试基于ip对记录进行分组,并尝试使用scala聚合上传和下载特定ip的流量

ip=10.22.3.88 upload =470 download =308
ip=10.22.3.89 upload =526 download =603
ip=10.22.3.88 upload =542 download =603
ip=10.22.3.90 upload =292 download =235
ip=10.22.3.90 upload =210 download =653
ip=10.22.3.88 upload =210 download =653

任何人都可以帮我解决同样的事情......

2 个答案:

答案 0 :(得分:0)

据我所知,您希望根据IP地址对上述结构进行分组。

因此,为了使我们的实现更容易,我将把以下数据存储在一个结构中,如下所示: -

case class IpAddress(ip:String,
                     upload: String,
                     download: String)

val ipAddress1 = IpAddress("10.22.3.88", "470", "308")
val ipAddress2 = IpAddress("10.22.3.89", "526", "603")
val ipAddress3 = IpAddress("10.22.3.88", "542", "603")
val ipAddress4 = IpAddress("10.22.3.90", "292", "235")
val ipAddress5 = IpAddress("10.22.3.90", "210", "653")
val ipAddress6 = IpAddress("10.22.3.88", "210", "653")

val listIpAddress = Seq(ipAddress1, ipAddress2, ipAddress3, ipAddress4, ipAddress5, ipAddress6)

我拿了一个名为IpAddress的case类,创建了6个结构作为输入值。

之后我创建了一个序列来存储我的所有6个结构。如果您不知道什么案例类和Seq是什么,那么我建议您阅读它们。

现在,下面的实现将为您提供key -> value的映射,其中密钥将是ip地址,值将是一系列IpAddress结构,它们具有公共IP地址。

val ipMap = listIpAddress.groupBy(_.ip).map { ipValue =>
   ipValue._1 -> ipValue._2
}

如果您有任何疑问,请将其放在下面的评论中。

希望这有帮助!

答案 1 :(得分:0)

据我所知,您需要在分组download后获得总IPString次流量。假设您的上述数据存储在case Class中。您可以先将每行映射到IP,然后根据upload进行分组。之后,您需要为该分组IP添加download//Create a case class which replicates your incoming data case class Tracker(IP: String, upload: Double, dowload: Double) //assuming you have data as a string and each record in a new line val data = """ip=10.22.3.88 upload =470 download =308 |ip=10.22.3.89 upload =526 download =603 |ip=10.22.3.88 upload =542 download =603 |ip=10.22.3.90 upload =292 download =235 |ip=10.22.3.90 upload =210 download =653 |ip=10.22.3.88 upload =210 download =653""" //process data line by line and map to Tracker class val lst = data.split("\n").map(x => { val array = x.split("=") Tracker(array(1).split(" ")(0).trim, array(2).split(" ")(0).trim.toDouble, array(3).split(" ")(0).trim.toDouble) }) //finally group by IP and then add upload and download traffic for grouped IP val result = lst.groupBy(_.IP) .map(x => (x._1, x._2.map(_.upload).sum, x._2.map(_.dowload).sum)) print(result) //output //List((10.22.3.90,502.0,888.0), (10.22.3.89,526.0,603.0), (10.22.3.88,1222.0,1564.0)) 流量。请找到以下代码。

echo $( 
    cmd1
    cmd2
    ...
)