我有一个标题为CompanyNam
的专栏,我需要提取所有权信息。该列位于QGIS属性表中,但这并不重要。该列看起来像附图。
例如,如果CEZ
是第一家公司,那么我为其分配了号码1,然后Sokolovska
是下一个号码,等等。如果CEZ
再次出现在其他行中它将得到数字1.重要的是要注意,如果我在列中有NULL
,我为每个NULL行条目分配一个不同的数字。我需要与CompanyNam
输出相对应的数字。我有以下代码:
EmptyArray = []
d = {}
newlist = []
for gFeat in GeneratorLayer.getFeatures():
Owner = gFeat.attributes()[gProvider.fieldNameIndex('CompanyNam')].toString()
A = ([str(i) for i in Owner]) #convert from PyQt4.QtCore.QString to normal string
B = ''.join(A)
EmptyArray.append(B)
for m, n in enumerate(EmptyArray):
if n not in d:
d[n] = [m+1]
newlist.append({n: d[n]})
if n == '':
d[n] = [m+1]
newlist.append({n: d[n]}) #Every NULL gets a new number
for names in newlist:
for o, p in names.iteritems():
if o == '':
a2 = str('{},NULL'.format(p))
elif o != '':
a2 = str('{},{}'.format(p,o))
然后我在后续步骤中使用a2
。对于具有60-100行的列,代码运行良好,但对于较大的列,计算时间非常高。您能否建议我可以重新编写此代码,保持逻辑?
输出如下:
[1],CEZ
[1],CEZ
[1],CEZ
[1],CEZ
[1],CEZ
[1],CEZ
[1],CEZ
[1],CEZ
[9],Sokolovska
[10],International
[11],ENERGOTRANS,
[12],Alpiq
[13],Mittal Steel
[14],United
[1],CEZ
[1],CEZ
[17],Dalkia.....
编号[1 ], [2], [3]
而不是[1 ], [9], [10]
会更好,但我还没弄清楚如何做到这一点。
答案 0 :(得分:0)
我会使用列表,就像你要附加项目一样,使用单独的字典来获取数字:
EmptyArray = []
d = {}
newlist = []
for gFeat in GeneratorLayer.getFeatures():
Owner = gFeat.attributes()[gProvider.fieldNameIndex('CompanyNam')].toString()
A = ([str(i) for i in Owner]) #convert from PyQt4.QtCore.QString to normal string
B = ''.join(A)
EmptyArray.append(B)
m = 1
for n in EmptyArray: # no need to enumerate as numbers should increment only on uniques
if n not in d: # this is a unique
d[n] = [m] # put number in dictionary, with key as val.
m += 1 # so increment
elif n == '': # if it's blank (and if it is already in the dict)
d[n].append(m) # append new number, as blanks always increment
m += 1 # increment
for name in EmptyArray: # looping less as only want to get names
if name == '': # if it's a blank, we want to pop out the first item in the list.
a2 = str('{},NULL'.format(d[name].pop(0)))
else: # otherwise we just index the first item.
a2 = str('{},{}'.format(d[name][0], name))
上述情况应该有效。
这样你就不必循环这么多,逻辑就更清晰了。这应该有助于计算时间,但也会给你正确的数字。
当循环遍历EmptyArray时,我们总是附加到列表中,这可能不是最好的方法,因为你只需要一个空白列表并且不使用列表直接访问项目会更快 - 但是当列表长达1个项目时,我怀疑会有很大的不同。
对于空白我们需要使用一个列表,因此它可以接受多个数字(每个空白一个)。为了获得每个空白的正确数字,我们只需要在每次遇到空白时从列表的前面弹出数字 - 这应该与最初分配给该空白的数字相对应。
我们也不需要制作新的列表,只需重复我们对EmptyArray的循环。
我们可以完全摆脱EmptyArray上的第二个循环(我没有看到任何需要的额外逻辑),只需在第一个循环中执行以下操作:
for n in EmptyArray: # no need to enumerate as numbers should increment only on uniques
if n == '' : # blanks always increment - no need to store as we treat them as new
a2 = str('{},NULL'.format(m))
m += 1 # so increment
elif n not in d: # this is unique, so add to dict and increment
d[n] = m # add to dict for future reference
a2 = str('{},{}'.format(m, n))
m += 1 # increment
else: # it's in the dict, and not a blank, so grab from dict.
a2 = str('{},{}'.format(d[n], n))
这样我们就可以摆脱大量的循环,并且应该大大提高程序的效率。它还消除了存储空白相应数字的需要,从而节省了额外的工作量 - 我们可以直接引用其他所有数字。
EmptyArray = ['',1,2,3,'',1,'',3,2,'',1]
m = 1
a2 = ''
def f1 (EmptyArray, d, m, a2):
for n in EmptyArray: # no need to enumerate as numbers should increment only on uniques
if n not in d: # this is a unique
d[n] = [m] # put number in dictionary, with key as val.
m += 1 # so increment
elif n == '': # if it's blank (and if it is already in the dict)
d[n].append(m) # append new number, as blanks always increment
m += 1 # increment
for name in EmptyArray: # looping less as only want to get names
if name == '': # if it's a blank, we want to pop out the first item in the list.
a2 += str('{},NULL\n'.format(d[name].pop(0)))
else: # otherwise we just index the first item.
a2 += str('{},{}\n'.format(d[name][0], name))
print(a2)
def f2 (EmptyArray, d, m, a2):
for n in EmptyArray: # no need to enumerate as numbers should increment only on uniques
if n == '' : # blanks always increment - no need to store as we treat them as new
a2 += str('{},NULL\n'.format(m))
m += 1 # so increment
elif n not in d: # this is unique, so add to dict and increment
d[n] = m # add to dict for future reference
a2 += str('{},{}\n'.format(m, n))
m += 1 # increment
else: # it's in the dict, and not a blank, so grab from dict.
a2 += str('{},{}\n'.format(d[n], n))
print(a2)
f1(EmptyArray, {}, m, a2)
f2(EmptyArray, {}, m, a2)
这是调用的输出:
1,NULL
2,1
3,2
4,3
5,NULL
2,1
6,NULL
4,3
3,2
7,NULL
2,1
1,NULL
2,1
3,2
4,3
5,NULL
2,1
6,NULL
4,3
3,2
7,NULL
2,1
和f1和f2的时间分别为:
3.4341892885662415e-05
1.5789376039385022e-05
所以f2的时间不到f1的一半。