我正在尝试解析包含属性及其值的巨大Excel文件。 问题如下:某些属性可以包含多个值。
示例:
list = ['a=1', 'b=2', 'c=3', 'd=4', 'd=5', 'd=6', 'e=7']
应该是:
list2 = ['a=1', 'b=2', 'c=3', 'd=4,5,6', 'e=7']
元素是长度可变的字符串,它们用'='分隔。
这就是我从Excel文件中生成列表的方法:
#for each row in the excel file.
for rows in range(DATA_ROW, sheet.nrows):
#generate a list with all properties.
for cols in range(sheet.ncols):
#if the propertie is not emty
if str(sheet.cell(PROPERTIE_ROW,cols).value) is not '':
proplist.append(sheet.cell(PROPERTIE_ROW,cols).value + '=' + str(sheet.cell(rows,cols).value) + '\n')
我试一试,但效果不好......
last_item = ''
new_list = []
#find and collect multiple values.
for i, item in enumerate(proplist):
#if the propertie is already in the list
if str(item).find(last_item) is not -1:
#just copy the value and append it to the propertie
new_list.insert(i, propertie);
else:
#slize the string in propertie and value
pos = item.find('=')
propertie = item[0:pos+1]
value = item[pos+1:len(item)]
#save the propertie
last_item = propertie
#append item
new_list.append(item)
任何帮助将不胜感激!
答案 0 :(得分:1)
如果订单无关紧要,您可以使用defaultdict
进行此类操作:
from collections import defaultdict
orig = ['a=1', 'b=2', 'c=3', 'd=4', 'd=5', 'd=6', 'e=7']
d = defaultdict(list)
for item in orig:
k,v = item.split('=',1)
d[k].append(v)
new = ['{0}={1}'.format(k,','.join(v)) for k,v in d.items()]
print(new) #['a=1', 'c=3', 'b=2', 'e=7', 'd=4,5,6']
我认为如果订单确实很重要,你可以使用OrderedDict
+ setdefault
,但它确实不是很漂亮:
from collections import OrderedDict
orig = ['a=1', 'b=2', 'c=3', 'd=4', 'd=5', 'd=6', 'e=7']
d = OrderedDict()
for item in orig:
k,v = item.split('=',1)
d.setdefault(k,[]).append(v)
new = ['{0}={1}'.format(k,','.join(v)) for k,v in d.items()]
print new # ['a=1', 'b=2', 'c=3', 'd=4,5,6', 'e=7']