在Python中求和向量

时间:2013-11-25 11:17:18

标签: python vector sum

您好我正在尝试在以下输入示例中添加第3列:

INPUT1:

act hi  1
act bye 2
act ciao    5

输入2:

art hi  1
art bye 2
art kiss    5

具有以下所需输出:

act-art hi  2
act-art bye 4
act-art kiss    5
act-art ciao    5

以下是我一直在使用的代码。

def sumVectors(classB_infile, classA_infile, outfile):

    class_dictA = {}

    with open(classA_infile, "rb") as opened_infile_A:
        for line in opened_infile_A:
            items = line.split()
            classA, feat, valuesA = items[:3]
            class_dictA[feat] = float(valuesA)


    class_dictB = {}

    with open(classB_infile, "rb") as opened_infile_B:
        for line in opened_infile_B:
            items = line.split()
            classB, feat, valuesB = items[:3]
            class_dictB[feat] = float(valuesB)


#print classA, classB, feat, sumVectors

####outfile 
    with open(outfile, "wb") as output_file:
        for key in class_dictA:
            if key in class_dictB:
                weight = class_dictA[key] + class_dictB[key]
                #outstring = "\t".join([classA + "-" +  classB, key, str(weight)])
            else:
                weight = class_dictA[key]
                outstring = "\t".join([classA + "-" +  classB, key, str(weight)])
                output_file.write(outstring + "\n")

        for key in class_dictB:
            if key in class_dictA:
                weight = class_dictB[key]
            outstring = "\t".join([classA + "-" + classB, key, str(weight)])
            output_file.write(outstring + "\n")

但是,它给了我以下输出:

act-art stress  5.0
act-art bye 2.0
act-art hi  1.0
act-art kiss    1.0

有关为什么它没有在第二列中总结共同值的任何见解? 谢谢

2 个答案:

答案 0 :(得分:4)

这包含实现所需结果的最简单修复:

def sumVectors(classB_infile, classA_infile, outfile):
    class_dictA = {}

    with open(classA_infile, "rb") as opened_infile_A:
        for line in opened_infile_A:
            items = line.split()
            classA, feat, valuesA = items[:3]
            class_dictA[feat.strip()] = float(valuesA)


    class_dictB = {}

    with open(classB_infile, "rb") as opened_infile_B:
        for line in opened_infile_B:
            items = line.split()
            classB, feat, valuesB = items[:3]
            class_dictB[feat.strip()] = float(valuesB)

    ####outfile 
    with open(outfile, "wb") as output_file:
        for key in class_dictA:
            if key in class_dictB:
                weight = class_dictA[key] + class_dictB[key]
                outstring = "\t".join([classA + "-" +  classB, key, str(weight)])
            else:
                weight = class_dictA[key]
                outstring = "\t".join([classA + "-" +  classB, key, str(weight)])
            output_file.write(outstring + "\n")

        for key in class_dictB:
            if key not in class_dictA: # if key was in A it was processed already
                weight = class_dictB[key]
                outstring = "\t".join([classA + "-" + classB, key, str(weight)])
                output_file.write(outstring + "\n")

然而,这确实可以简化:

def readFile(fileName, keys):
    result = {}
    class_ = ''
    with open(fileName, "rb") as opened_infile_A:
        for line in opened_infile_A:
            items = line.split()
            class_, feat, value = items[:3]
            keys.add(feat)
            result[feat] = float(value)
    return (class_, result)


def sumVectors(classB_infile, classA_infile, outfile):
    keys = set()

    classA, class_dictA = readFile(classA_infile, keys)
    classB, class_dictB = readFile(classB_infile, keys)

    with open(outfile, "wb") as output_file:
        for key in keys:
            weightA = class_dictA[key] if key in class_dictA else 0
            weightB = class_dictB[key] if key in class_dictB else 0
            weight = weightA + weightB
            outstring = "\t".join([classA + "-" +  classB, key, str(weight)])
            output_file.write(outstring + "\n")

答案 1 :(得分:3)

我建议使用defaultdict,而不是编写两个循环来实现两个词典的“合并”:

result = collections.defaultdict(float, class_dictA)
for k, v in class_dictB.items(): result[k] += v

这样做是为了创建一个新的result字典,它是class_dictA的副本。然后,将class_dictB中的所有值添加到result字典中。如果一个密钥尚不存在,那么它的处理方式与它具有值(调用float()所做的那样)相同。