使其更高效

Question

我有一个标题为CompanyNam的专栏，我需要提取所有权信息。该列位于QGIS属性表中，但这并不重要。该列看起来像附图。

enter image description here

例如，如果CEZ是第一家公司，那么我为其分配了号码1，然后Sokolovska是下一个号码，等等。如果CEZ再次出现在其他行中它将得到数字1.重要的是要注意，如果我在列中有NULL，我为每个NULL行条目分配一个不同的数字。我需要与CompanyNam输出相对应的数字。我有以下代码：

EmptyArray = []
d = {}
newlist = []
for gFeat in GeneratorLayer.getFeatures():
    Owner = gFeat.attributes()[gProvider.fieldNameIndex('CompanyNam')].toString()
    A = ([str(i) for i in Owner]) #convert from PyQt4.QtCore.QString to normal string
    B = ''.join(A)
    EmptyArray.append(B)
    for m, n in enumerate(EmptyArray):
        if n not in d:
            d[n] = [m+1]
        newlist.append({n: d[n]})
        if n == '':
            d[n] = [m+1]
        newlist.append({n: d[n]}) #Every NULL gets a new number
    for names in newlist:
        for o, p in names.iteritems():
            if o == '':
                a2 = str('{},NULL'.format(p))
            elif o != '':
                a2 = str('{},{}'.format(p,o))

然后我在后续步骤中使用a2。对于具有60-100行的列，代码运行良好，但对于较大的列，计算时间非常高。您能否建议我可以重新编写此代码，保持逻辑？输出如下：

[1],CEZ
[1],CEZ
[1],CEZ
[1],CEZ
[1],CEZ
[1],CEZ
[1],CEZ
[1],CEZ
[9],Sokolovska
[10],International
[11],ENERGOTRANS,
[12],Alpiq
[13],Mittal Steel
[14],United
[1],CEZ
[1],CEZ
[17],Dalkia.....

编号[1 ], [2], [3]而不是[1 ], [9], [10]会更好，但我还没弄清楚如何做到这一点。

Answer 1

我会使用列表，就像你要附加项目一样，使用单独的字典来获取数字：

EmptyArray = []
d = {}
newlist = []

for gFeat in GeneratorLayer.getFeatures():

    Owner = gFeat.attributes()[gProvider.fieldNameIndex('CompanyNam')].toString()

    A = ([str(i) for i in Owner]) #convert from PyQt4.QtCore.QString to normal string
    B = ''.join(A)

    EmptyArray.append(B)

    m = 1

    for n in EmptyArray:  # no need to enumerate as numbers should increment only on uniques

        if n not in d:  # this is a unique
            d[n] = [m]  # put number in dictionary, with key as val.
            m += 1  # so increment

        elif n == '':  # if it's blank (and if it is already in the dict)
            d[n].append(m)  # append new number, as blanks always increment
            m += 1  # increment

    for name in EmptyArray:  # looping less as only want to get names

        if name == '':  # if it's a blank, we want to pop out the first item in the list.
            a2 = str('{},NULL'.format(d[name].pop(0)))

        else:  # otherwise we just index the first item.
            a2 = str('{},{}'.format(d[name][0], name))

上述情况应该有效。

这样你就不必循环这么多，逻辑就更清晰了。这应该有助于计算时间，但也会给你正确的数字。

当循环遍历EmptyArray时，我们总是附加到列表中，这可能不是最好的方法，因为你只需要一个空白列表并且不使用列表直接访问项目会更快 - 但是当列表长达1个项目时，我怀疑会有很大的不同。

对于空白我们需要使用一个列表，因此它可以接受多个数字（每个空白一个）。为了获得每个空白的正确数字，我们只需要在每次遇到空白时从列表的前面弹出数字 - 这应该与最初分配给该空白的数字相对应。

我们也不需要制作新的列表，只需重复我们对EmptyArray的循环。

使其更高效

我们可以完全摆脱EmptyArray上的第二个循环（我没有看到任何需要的额外逻辑），只需在第一个循环中执行以下操作：

    for n in EmptyArray:  # no need to enumerate as numbers should increment only on uniques

        if n == '' :  # blanks always increment - no need to store as we treat them as new
            a2 = str('{},NULL'.format(m))
            m += 1  # so increment

        elif n not in d:  # this is unique, so add to dict and increment
            d[n] = m  # add to dict for future reference
            a2 = str('{},{}'.format(m, n))
            m += 1  # increment

        else:  # it's in the dict, and not a blank, so grab from dict.
            a2 = str('{},{}'.format(d[n], n))

这样我们就可以摆脱大量的循环，并且应该大大提高程序的效率。它还消除了存储空白相应数字的需要，从而节省了额外的工作量 - 我们可以直接引用其他所有数字。

所以我做了两个函数来尝试复制你想要的东西：

EmptyArray = ['',1,2,3,'',1,'',3,2,'',1]
m = 1
a2 = ''

def f1 (EmptyArray, d, m, a2):
    for n in EmptyArray:  # no need to enumerate as numbers should increment only on uniques

        if n not in d:  # this is a unique
            d[n] = [m]  # put number in dictionary, with key as val.
            m += 1  # so increment

        elif n == '':  # if it's blank (and if it is already in the dict)
            d[n].append(m)  # append new number, as blanks always increment
            m += 1  # increment

    for name in EmptyArray:  # looping less as only want to get names

        if name == '':  # if it's a blank, we want to pop out the first item in the list.
            a2 += str('{},NULL\n'.format(d[name].pop(0)))

        else:  # otherwise we just index the first item.
            a2 += str('{},{}\n'.format(d[name][0], name))

    print(a2)

def f2 (EmptyArray, d, m, a2):
    for n in EmptyArray:  # no need to enumerate as numbers should increment only on uniques

        if n == '' :  # blanks always increment - no need to store as we treat them as new
            a2 += str('{},NULL\n'.format(m))
            m += 1  # so increment

        elif n not in d:  # this is unique, so add to dict and increment
            d[n] = m  # add to dict for future reference
            a2 += str('{},{}\n'.format(m, n))
            m += 1  # increment

        else:  # it's in the dict, and not a blank, so grab from dict.
            a2 += str('{},{}\n'.format(d[n], n))

    print(a2)

f1(EmptyArray, {}, m, a2)
f2(EmptyArray, {}, m, a2)

这是调用的输出：

1,NULL
2,1
3,2
4,3
5,NULL
2,1
6,NULL
4,3
3,2
7,NULL
2,1

1,NULL
2,1
3,2
4,3
5,NULL
2,1
6,NULL
4,3
3,2
7,NULL
2,1

和f1和f2的时间分别为：

3.4341892885662415e-05
1.5789376039385022e-05

所以f2的时间不到f1的一半。

重写循环以为列中的每一行分配一个数字

1 个答案:

使其更高效

所以我做了两个函数来尝试复制你想要的东西：