Question

我有一个我已经删除的字符串列表，我想将字符串分成组，然后将其重新整理为列数据。但是，每个组都不存在变量标题。

我的列表名为complist，如下所示：

[u'Intake Received Date:',
 u'9/11/2012',
 u'Intake ID:',
 u'CA00325127',
 u'Allegation Category:',
 u'Infection Control',
 u'Investigation Finding:',
 u'Substantiated',
 u'Intake Received Date:',
 u'5/14/2012',
 u'Intake ID:',
 u'CA00310421',
 u'Allegation Category:',
 u'Quality of Care/Treatment',
 u'Investigation Finding:',
 u'Substantiated',
 u'Intake Received Date:',
 u'8/15/2011',
 u'Intake ID:',
 u'CA00279396',
 u'Allegation Category:',
 u'Quality of Care/Treatment',
 u'Sub Categories:',
 u'Screening',
 u'Investigation Finding:',
 u'Unsubstantiated',]

我的目标是让它看起来像这样：

'Intake Received Date', 'Intake ID', 'Allegation Category', 'Sub Categories', 'Investigation Finding'
'9/11/2012', 'CA00325127', 'Infection Control', '', 'Substantiated'
'5/14/2012', 'CA00310421', 'Quality of Care/Treatment', '', 'Substantiated'
'8/15/2011', 'CA00279396', 'Quality of Care/Treatment', 'Screening', 'Unsubstantiated'

我做的第一件事就是根据起始元素Intake Received Date

将列表分成块

compgroup = []
for k, g in groupby(complist, key=lambda x:re.search(r'Intake Received Date', x)):
    if not k:
        compgroup.append(list(g))


#Intake Received Date was removed, so insert it back to beginning of each list:
for c in compgroup:
    c.insert(0, u'Intake Received Date')


#Create list of dicts to map the preceding titles to their respective data element:
dic = []
for c in compgroup:
    dic.append(dict(zip(*[iter(c)]*2)))

下一步是将dicts列表转换为柱状数据，但此时我觉得我的方法过于复杂，我必须错过更优雅的东西。我很感激任何指导。

Answer 1

假设：

data=[u'Intake Received Date:',
 u'9/11/2012',
 u'Intake ID:',
 u'CA00325127',
 u'Allegation Category:',
 u'Infection Control',
 u'Investigation Finding:',
 u'Substantiated',
 u'Intake Received Date:',
 u'5/14/2012',
 u'Intake ID:',
 u'CA00310421',
 u'Allegation Category:',
 u'Quality of Care/Treatment',
 u'Investigation Finding:',
 u'Substantiated',
 u'Intake Received Date:',
 u'8/15/2011',
 u'Intake ID:',
 u'CA00279396',
 u'Allegation Category:',
 u'Quality of Care/Treatment',
 u'Sub Categories:',
 u'Screening',
 u'Investigation Finding:',
 u'Unsubstantiated',]

你的方法实际上非常好。我编辑了一下。您不需要正则表达式，也不需要重新插入Intake Received Date

尝试：

from itertools import groupby

headers=['Intake Received Date:', 'Intake ID:', 'Allegation Category:', 'Sub Categories:', 'Investigation Finding:']
sep='Intake Received Date:'
compgroup = []
for k, g in groupby(data, key=lambda x: x==sep):    
    if not k:
        compgroup.append([sep]+list(g))

print ', '.join(e[0:-1] for e in headers)    

for di in [dict(zip(*[iter(c)]*2)) for c in compgroup]:
    line=[]
    for h in headers:
        try:
            line.append(di[h])
        except KeyError:
            line.append('*')
    print ', '.join(line)

打印：

Intake Received Date, Intake ID, Allegation Category, Sub Categories, Investigation Finding
9/11/2012, CA00325127, Infection Control, *, Substantiated
5/14/2012, CA00310421, Quality of Care/Treatment, *, Substantiated
8/15/2011, CA00279396, Quality of Care/Treatment, Screening, Unsubstantiated

将Python列表转换为列数据

1 个答案: