我正在尝试基于ip对记录进行分组,并尝试使用scala聚合上传和下载特定ip的流量
ip=10.22.3.88 upload =470 download =308
ip=10.22.3.89 upload =526 download =603
ip=10.22.3.88 upload =542 download =603
ip=10.22.3.90 upload =292 download =235
ip=10.22.3.90 upload =210 download =653
ip=10.22.3.88 upload =210 download =653
任何人都可以帮我解决同样的事情......
答案 0 :(得分:0)
据我所知,您希望根据IP地址对上述结构进行分组。
因此,为了使我们的实现更容易,我将把以下数据存储在一个结构中,如下所示: -
case class IpAddress(ip:String,
upload: String,
download: String)
val ipAddress1 = IpAddress("10.22.3.88", "470", "308")
val ipAddress2 = IpAddress("10.22.3.89", "526", "603")
val ipAddress3 = IpAddress("10.22.3.88", "542", "603")
val ipAddress4 = IpAddress("10.22.3.90", "292", "235")
val ipAddress5 = IpAddress("10.22.3.90", "210", "653")
val ipAddress6 = IpAddress("10.22.3.88", "210", "653")
val listIpAddress = Seq(ipAddress1, ipAddress2, ipAddress3, ipAddress4, ipAddress5, ipAddress6)
我拿了一个名为IpAddress的case类,创建了6个结构作为输入值。
之后我创建了一个序列来存储我的所有6个结构。如果您不知道什么案例类和Seq是什么,那么我建议您阅读它们。
现在,下面的实现将为您提供key -> value
的映射,其中密钥将是ip地址,值将是一系列IpAddress结构,它们具有公共IP地址。
val ipMap = listIpAddress.groupBy(_.ip).map { ipValue =>
ipValue._1 -> ipValue._2
}
如果您有任何疑问,请将其放在下面的评论中。
希望这有帮助!
答案 1 :(得分:0)
据我所知,您需要在分组download
后获得总IP
和String
次流量。假设您的上述数据存储在case Class
中。您可以先将每行映射到IP
,然后根据upload
进行分组。之后,您需要为该分组IP添加download
和//Create a case class which replicates your incoming data
case class Tracker(IP: String, upload: Double, dowload: Double)
//assuming you have data as a string and each record in a new line
val data =
"""ip=10.22.3.88 upload =470 download =308
|ip=10.22.3.89 upload =526 download =603
|ip=10.22.3.88 upload =542 download =603
|ip=10.22.3.90 upload =292 download =235
|ip=10.22.3.90 upload =210 download =653
|ip=10.22.3.88 upload =210 download =653"""
//process data line by line and map to Tracker class
val lst = data.split("\n").map(x => {
val array = x.split("=")
Tracker(array(1).split(" ")(0).trim,
array(2).split(" ")(0).trim.toDouble,
array(3).split(" ")(0).trim.toDouble)
})
//finally group by IP and then add upload and download traffic for grouped IP
val result = lst.groupBy(_.IP)
.map(x =>
(x._1,
x._2.map(_.upload).sum,
x._2.map(_.dowload).sum))
print(result)
//output
//List((10.22.3.90,502.0,888.0), (10.22.3.89,526.0,603.0), (10.22.3.88,1222.0,1564.0))
流量。请找到以下代码。
echo $(
cmd1
cmd2
...
)