Question

我正在尝试在python中编写以下问题：我有一个多列文件，我将其存储在字典/哈希中，以便在更进一步的步骤中比较它的元素。该文件的结构如下所示：

ID  ELEMENT_1   ELEMENT_2   ELEMENT_3   ELEMENT_4   ELEMENT_5

其中制表符分隔元素。

在这6列中，我需要存储ID＆＃39;和元素1,2和3的方式是响应每个键的键和值，所以说：key = id，value = element_1，element_2，elemet_3。我的想法在这里：

sources = open(sys.argv[1], "r").readlines()[32:]
for line in sources:
    tokens = line.split("\t")

    id = tokens[0].strip()
    element_1 = tokens[1].strip()
    element_2 = tokens[2].strip()
    element_3 = tokens[3].strip()

    hash = {}

    hash.setdefault(id, []).append(id)
    hash.setdefault(element_1, []).append(element_1)
    hash.setdefault(element_2, []).append(element_2)
    hash.setdefault(element_3, []).append(element_3)

好吧，这似乎有效，但我认为我没有遵循关键和价值的想法，因为它们在这里是相同的。
主要思想是：如果两个ID在不同的行中相同，即两行中的element_3不同，则应打印它们。

for line in hash:
        #if duplicates in hash[id] and not the same in element_3:
            print (hash[id], hash[element_1], hash[element_2], hash[element_3])

这实际上有可能吗？在这一点上我当然很困惑，我希望有人能给出一些建议。

Answer 1

在这个解决方案中，我在阅读时比较条目。这意味着第一次出现具有特定id的条目将是添加到stored_values字典的条目

stored_values[id]包含数组[ELEMENT_1,ELEMENT_2,ELEMENT_3,ELEMENT_4,ELEMENT_5]

stored_values = {}
for line in sources:
    # find all the tokens and strip them at the same time
    tokens = [t.strip() for t in line.split("\t")]

    new_id, new_values = tokens[0], tokens[1:]

    # check if the id is already stored
    if new_id in stored_values.keys():
        # Evaluate the items in the two lists to see if they are equal
        values_are_equal = True
        for stored_val , new_val in zip(stored_values[new_id], new_values):
            if not stored_val == new_val:
                values_are_equal = False

        if values_are_equal:
            # Do whatever needs to be done if both duplicate id and values
            pass
        if not values_are_equal:
            print("ID '{}', has a duplicate entry with different values".format(new_id))
            print("Stored Entry:  " + str(stored_values[new_id]))
            print("Current Entry: " + str(new_values))
            # Do whatever else needs to be done if duplicate id but different values

    # Finally if this is a new id, then add just add it
    else:
        stored_values[new_id] = new_values

编辑：在添加到存储之前，您可能还需要添加一些逻辑来检查new_id和new_values是否具有正确的格式/大小/类型/等值列表或与现有列表进行比较

EDIT2 zip函数允许您同时迭代多个列表

在比较两个相同大小的列表时，我通常只使用zip功能

如果我有两个列表

lst1 = range(0,10)
lst2 = range(10,20)

循环

min_length = min([len(lst1 ), len(lst2 )])
for ii in xrange(min_length ):
    l1 = lst1[ii]
    l2 = lst2[ii]
    # go on to do stuff with l1, l2

实际上与

相同

for l1, l2 in zip(lst1 , lst2 ):
    # go on to do stuff with l1, l2

在解决方案中，我假设列表是相同的。 ie）values_are_equal是True。

当我循环浏览列表时，如果我发现stored_val或new_val（代表stored_values和new_values中匹配索引的值）不相同，然后我将局部变量values_are_equal更改为False。

如果我找不到stored_val或new_val不相同的点，values_are_equal将在循环后True

创建字典以便稍后比较元素的最佳方法

1 个答案: