提取列表中的元素

时间:2015-03-10 07:04:00

标签: python list

我有一个列表,如下所示。我通过搜索这些特定字段从文本文件中提取了此列表。现在我想删除不需要的单词(tyrone cluster,resources_used等)并将其写入新文件。列表大小每次都会更改,但字段保持不变

job_list:

['Job Id: 49361.tyrone-cluster', 'resources_used.cput = 14:32:14', 'resources_used.mem = 13955852kb', 'resources_used.vmem = 14199016kb', 'resources_used.walltime = 05:23:02', 'job_state = R', 'queue = qp32', 'Job Id: 49362.tyrone-cluster', 'job_state = Q', 'queue = batch', 'comment = Not Running: Queue not an execution queue.', 'Job Id: 49395.tyrone-cluster', 'resources_used.cput = 31:20:32', 'resources_used.mem = 19179712kb', 'resources_used.vmem = 158305072kb', 'resources_used.walltime = 01:57:34', 'job_state = R', 'queue = idqueue', 'Job Id: 49396.tyrone-cluster', 'resources_used.cput = 46:26:45', 'resources_used.mem = 5347092kb', 'resources_used.vmem = 7588024kb', 'resources_used.walltime = 01:44:50', 'job_state = R', 'queue = qp32', 'Job Id: 49408.tyrone-cluster', 'job_state = Q', 'queue = qp32']

新文件的输出应为

job.txt

49361 14:32:14 13955852kb 14199016kb 05:23:02 R qp32
49362 Q batch
49395 31:20:32 19179712kb 158305072kb 01:57:34 R idqueue
49396 46:26:45 5347092kb  7588024kb 01:44:50 R qp32
49408 Q qp32

2 个答案:

答案 0 :(得分:1)

这种方法略有不同,首先找到列表的工作元素,然后处理每个"块"一次。

每个作业块的信息都在row列表中编译,然后在作业块结束时附加到rows列表。

import re

l = ['Job Id: 49361.tyrone-cluster', 'resources_used.cput = 14:32:14', 'resources_used.mem = 13955852kb', 'resources_used.vmem = 14199016kb', 'resources_used.walltime = 05:23:02', 'job_state = R', 'queue = qp32', 'Job Id: 49362.tyrone-cluster', 'job_state = Q', 'queue = batch', 'comment = Not Running: Queue not an execution queue.', 'Job Id: 49395.tyrone-cluster', 'resources_used.cput = 31:20:32', 'resources_used.mem = 19179712kb', 'resources_used.vmem = 158305072kb', 'resources_used.walltime = 01:57:34', 'job_state = R', 'queue = idqueue', 'Job Id: 49396.tyrone-cluster', 'resources_used.cput = 46:26:45', 'resources_used.mem = 5347092kb', 'resources_used.vmem = 7588024kb', 'resources_used.walltime = 01:44:50', 'job_state = R', 'queue = qp32', 'Job Id: 49408.tyrone-cluster', 'job_state = Q', 'queue = qp32']

job_elements = [i for (i,e) in enumerate(l) if re.match(r'Job Id: (\d+)', e)] + [len(l) + 1]

rows = []
for (s,e) in zip(job_elements[:-1], job_elements[1:]):
    row = []
    for line in l[s:e]:
        mat = re.match(r'Job Id: (\d+)', line)
        if mat:
            row.append(mat.group(1).strip())
            continue
        mat = re.match(r'.* = (.*)', line)
        if mat:
            row.append(mat.group(1).strip())
            continue

    rows.append(' '.join(row))

# Print output :
for r in rows:
    print r

# Or write to file:
with open('output.txt', 'w') as f:
    for r in rows:
        f.write(r)        # You could write these two lines as f.write(r + '\n')
        f.write('\n')     #   if you didn't care about creating a string unnecessarily

输出:

49361 14:32:14 13955852kb 14199016kb 05:23:02 R qp32
49362 Q batch Not Running: Queue not an execution queue.
49395 31:20:32 19179712kb 158305072kb 01:57:34 R idqueue
49396 46:26:45 5347092kb 7588024kb 01:44:50 R qp32
49408 Q qp32

作为参考,(s,e) in zip(job_elements[:-1], job_elements[1:])产生以下元组,它们是"作业ID"的起始(包括)和结束(不包括)索引。原始清单的条目:

( 0,  7)
( 7, 11)
(11, 18)
(18, 25)
(25, 29)

答案 1 :(得分:0)

In [6]: for s in ss:
    m = re.match('Job Id: (\d+)',s)
    if m:
        sys.stdout.write('\n'+m.group(1)+' ')
        continue
    m = re.match('^.+ = (.+)$',s)
    if m:
       sys.stdout.write(m.group(1)+' ')
       continue
   ...:  

49361 14:32:14 13955852kb 14199016kb 05:23:02 R qp32 
49362 Q batch Not Running: Queue not an execution queue. 
49395 31:20:32 19179712kb 158305072kb 01:57:34 R idqueue 
49396 46:26:45 5347092kb 7588024kb 01:44:50 R qp32 
49408 Q qp32