我有一个我已经删除的字符串列表,我想将字符串分成组,然后将其重新整理为列数据。但是,每个组都不存在变量标题。
我的列表名为complist
,如下所示:
[u'Intake Received Date:',
u'9/11/2012',
u'Intake ID:',
u'CA00325127',
u'Allegation Category:',
u'Infection Control',
u'Investigation Finding:',
u'Substantiated',
u'Intake Received Date:',
u'5/14/2012',
u'Intake ID:',
u'CA00310421',
u'Allegation Category:',
u'Quality of Care/Treatment',
u'Investigation Finding:',
u'Substantiated',
u'Intake Received Date:',
u'8/15/2011',
u'Intake ID:',
u'CA00279396',
u'Allegation Category:',
u'Quality of Care/Treatment',
u'Sub Categories:',
u'Screening',
u'Investigation Finding:',
u'Unsubstantiated',]
我的目标是让它看起来像这样:
'Intake Received Date', 'Intake ID', 'Allegation Category', 'Sub Categories', 'Investigation Finding'
'9/11/2012', 'CA00325127', 'Infection Control', '', 'Substantiated'
'5/14/2012', 'CA00310421', 'Quality of Care/Treatment', '', 'Substantiated'
'8/15/2011', 'CA00279396', 'Quality of Care/Treatment', 'Screening', 'Unsubstantiated'
我做的第一件事就是根据起始元素Intake Received Date
compgroup = []
for k, g in groupby(complist, key=lambda x:re.search(r'Intake Received Date', x)):
if not k:
compgroup.append(list(g))
#Intake Received Date was removed, so insert it back to beginning of each list:
for c in compgroup:
c.insert(0, u'Intake Received Date')
#Create list of dicts to map the preceding titles to their respective data element:
dic = []
for c in compgroup:
dic.append(dict(zip(*[iter(c)]*2)))
下一步是将dicts列表转换为柱状数据,但此时我觉得我的方法过于复杂,我必须错过更优雅的东西。我很感激任何指导。
答案 0 :(得分:1)
假设:
data=[u'Intake Received Date:',
u'9/11/2012',
u'Intake ID:',
u'CA00325127',
u'Allegation Category:',
u'Infection Control',
u'Investigation Finding:',
u'Substantiated',
u'Intake Received Date:',
u'5/14/2012',
u'Intake ID:',
u'CA00310421',
u'Allegation Category:',
u'Quality of Care/Treatment',
u'Investigation Finding:',
u'Substantiated',
u'Intake Received Date:',
u'8/15/2011',
u'Intake ID:',
u'CA00279396',
u'Allegation Category:',
u'Quality of Care/Treatment',
u'Sub Categories:',
u'Screening',
u'Investigation Finding:',
u'Unsubstantiated',]
你的方法实际上非常好。我编辑了一下。您不需要正则表达式,也不需要重新插入Intake Received Date
尝试:
from itertools import groupby
headers=['Intake Received Date:', 'Intake ID:', 'Allegation Category:', 'Sub Categories:', 'Investigation Finding:']
sep='Intake Received Date:'
compgroup = []
for k, g in groupby(data, key=lambda x: x==sep):
if not k:
compgroup.append([sep]+list(g))
print ', '.join(e[0:-1] for e in headers)
for di in [dict(zip(*[iter(c)]*2)) for c in compgroup]:
line=[]
for h in headers:
try:
line.append(di[h])
except KeyError:
line.append('*')
print ', '.join(line)
打印:
Intake Received Date, Intake ID, Allegation Category, Sub Categories, Investigation Finding
9/11/2012, CA00325127, Infection Control, *, Substantiated
5/14/2012, CA00310421, Quality of Care/Treatment, *, Substantiated
8/15/2011, CA00279396, Quality of Care/Treatment, Screening, Unsubstantiated