Question

我有一个包含JSON格式数据的文件。我正在逐行阅读它，每行都有1个JSON记录，所以格式确实不是问题。以下是一个示例行：

{"url": "http://ldrlongdistancerider.com/bikers_rights_motorcycle/rightsriders0163.php", "timestamp": 1257072412, "tags": ["nscensorship", "cloudmark", "network", "solutions", "content", "based", "spam", "signatures"]}

我需要做的是计算所有重复的网址并将其打印出来：

 http://ldrlongdistancerider.com/bikers_rights_motorcycle/rightsriders0163.php"  1

如何使用流来实现这一目标？顺便说一下，我需要根据时间戳过滤记录。因此，如果某人通过了一系列日期，我将不得不计算落在该范围内的网址。我做了大部分工作，但这个计算部分对我来说很困惑。

这是我到目前为止所做的：

for (Path filePath : files) {
        try {
            Files.lines(Paths.get(filePath.toUri()))
                 .filter(s -> Link.parse(s).timestamp() > startSeconds)
                 .filter(s -> Link.parse(s).timestamp() < stopSeconds)
                 .forEach(s -> countMap.put(Link.parse(s).url(), 1));
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

countMap是字符串的HashMap，整数

Answer 1

您需要多次解析，并且您正在改变外部地图，而不是让流为您创建地图，这是一种反模式（这使得流很难并行）

你可以使用

Files.lines(Paths.get(filePath.toUri()))
     .map(Link::parse)
     .filter(link -> link.timestamp() > startSeconds && link.timestamp() < stopSeconds)
     .collect(Collectors.groupingBy(Link::url, Collectors.counting()));

Answer 2

countMap = Files.lines(Paths.get(filePath.toUri()))
                 .filter(s -> Link.parse(s).timestamp() > startSeconds)
                 .filter(s -> Link.parse(s).timestamp() < stopSeconds)
                 .collect(Collectors.groupingBy(x ->Link.parse(x).url()))
                 .entrySet()
                 .stream()
                 .collect(Collectors.toMap(entry -> entry.getKey(), entry -> entry.getValue().size()));

这就是我最终做的事情并且有效。是的，我需要处理解析问题@JB Nizet

Java 8 Streams统计所有键

2 个答案: