Question

我经常遇到这种情况。在这种格式的某个版本中，我有一批数据（以CSV，XML格式存储，无关紧要）：

key1|value1
key1|value2
key1|value3
key2|value4
key2|value5
etc.

并且需要能够以这种形式处理它：

data[key1] => [value1, value2, value3]
data[key2] => [value4, value5]
etc.

从A转换为B的最佳方法是什么？我通常像这样循环遍历列表（伪代码），但我不喜欢我必须重复我的数组构建代码。

data = []
values = []
currentKey = ""
foreach (line in inputData) {
    key, value = split(line)
    if ((currentKey != "") and (currentKey != key)) {
        data[currentKey] = values
        values = []
    }
    currentKey = key
    values.add(value)
}
// this is the part I don't like, but it's necessary to capture the last group
data[currentKey] = values

我特别没有命名一种语言，因为我必须至少使用Javascript，C＃，Perl和PHP。如果有特定语言的解决方案会很棒，但我真的在寻找最有效的通用算法方法。

Answer 1

您可以将代码更改为：

data = {}

currentKey = ""

foreach (line in inputData) {

    key, value = split(line)
    if (currentKey != key) {
        data[key] = [] // like data.put(key,new ArrayList<String>()) in java
    }
    data[key].add(value) // like data.get(key).add(value) in java
    currentKey = key
}

Answer 2

这是一个解决方案。首先，创建一个地图。对于数据文件中的每个条目，找到键和值。检查密钥是否在地图中。如果不是，请向包含该键的新值的地图添加新列表。如果密钥已经在地图中，只需将新值添加到列表中。

def hash = [:]
new File("test.data").eachLine { String line ->
    def (key,value)  = line.split(/\|/)
    hash.get(key, []) << value
}

println hash

打印出以下地图：

[key1:[value1, value2, value3], key2:[value4, value5]]

无需跟踪currentKey。

编辑：这是用Groovy编写的，但在其他语言中应该非常相似。 hash.get()返回键的值或提供的默认值（在上面的代码段中为空列表），而左移（<<）运算符会向列表添加内容。

用于将数据从平面文件重组为具有密钥的散列的算法

2 个答案: