我有一个文本文件,如下所示:
node13
state = free
np = 8
properties = beta,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node13 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=? 15201,nsessions=? 01,nusers=0,idletime=6837317,totmem=20506268kb,availmem=20259728kb,physmem=20506268kb,ncpus=8,loadave=0.00,gres=,netload=17130666575,se=free,jobs=,varattr=,rectime=1333639375
node14
state = job-exclusive
np = 8
properties = beta,eightcores
ntype = cluster
我想只在节点空闲时抓取节点。为此,我必须制作一个匹配node(..)
的正则表达式,仅当以下行有state = free
时。你能帮我解决这个问题吗?
修改:
到目前为止没有任何作用。可能是因为我没有在文件中阅读,而是
proc = subprocess.Popen("pbsnodes", stdout=subprocess.PIPE)
listOfFreeNodes = proc.stdout.read()
它可能会对解决方案产生什么影响吗?这是完整的pbsnodes
输出:
node01
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node01 2.6.27.19-5-01,nusers=0,idletime=861913,totmem=16432576kb,availmem=16=free,jobs=,varattr=,rectime=1333641123
node02
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node02 2.6.27.19-5-nusers=2,idletime=5357510,totmem=16432576kb,availmem=1617ree,jobs=,varattr=,rectime=1333641107
node03
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node03 2.6.27.19-5-s=1,idletime=8564681,totmem=16432576kb,availmem=16029924kobs=60966.hpchead.linux,varattr=,rectime=1333641119
node04
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node04 2.6.27.19-5-01,nusers=0,idletime=8564678,totmem=16432576kb,availmem=1e=free,jobs=,varattr=,rectime=1333641124
node05
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node05 2.6.27.19-5-01,nusers=0,idletime=2072593,totmem=16432652kb,availmem=1=free,jobs=,varattr=,rectime=1333641091
node06
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node06 2.6.27.19-5-s=1,idletime=9038,totmem=16432576kb,availmem=16200960kb,p,varattr=,rectime=1333641096
node07
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node07 2.6.27.19-5-s=1,idletime=8564671,totmem=16432576kb,availmem=16173848kobs=,varattr=,rectime=1333641134
node08
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node08 2.6.27.19-5- 21356,nsessions=5,nusers=1,idletime=8564604,totmem=1643219260329746,state=free,jobs=,varattr=,rectime=1333641095
node09
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node09 2.6.27.19-5-01,nusers=0,idletime=8564648,totmem=16432552kb,availmem=1e=free,jobs=,varattr=,rectime=1333641126
node10
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node10 2.6.27.19-5-2,nsessions=5,nusers=1,idletime=6821493,totmem=16432552kb036941,state=free,jobs=,varattr=,rectime=1333641133
node11
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node11 2.6.27.19-5-01,nusers=0,idletime=8564599,totmem=16432556kb,availmem=1e=free,jobs=,varattr=,rectime=1333641120
node12
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node12 2.6.27.19-5-01,nusers=0,idletime=8564627,totmem=16432556kb,availmem=1e=free,jobs=,varattr=,rectime=1333641121
node13
state = free
np = 8
properties = beta,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node13 2.6.27.19-5-01,nusers=0,idletime=6839072,totmem=20506268kb,availmem=2e=free,jobs=,varattr=,rectime=1333641130
node14
state = job-exclusive
np = 8
properties = beta,eightcores
ntype = cluster
jobs = 0/66481.hpchead.linux, 1/66481.hpchead.linux,chead.linux, 6/66481.hpchead.linux, 7/66481.hpchead.linux
status = opsys=linux,uname=Linux node14 2.6.27.19-5-,nusers=1,idletime=8568052,totmem=24635060kb,availmem=206free,jobs=66481.hpchead.linux,varattr=,rectime=1333641132
node15
state = job-exclusive
np = 8
properties = beta,eightcores
ntype = cluster
jobs = 0/66482.hpchead.linux, 1/66482.hpchead.linux,chead.linux, 6/66482.hpchead.linux, 7/66482.hpchead.linux
status = opsys=linux,uname=Linux node15 2.6.27.19-5-,nusers=1,idletime=8567636,totmem=24635012kb,availmem=212free,jobs=66482.hpchead.linux,varattr=,rectime=1333641092
node16
state = job-exclusive
np = 8
properties = beta,eightcores
ntype = cluster
jobs = 0/66481.hpchead.linux, 1/66481.hpchead.linux,chead.linux, 6/66481.hpchead.linux, 7/66481.hpchead.linux
status = opsys=linux,uname=Linux node16 2.6.27.19-5-=1,idletime=8564418,totmem=24634928kb,availmem=20700104kbbs=66481.hpchead.linux,varattr=,rectime=1333641117
node17
state = job-exclusive
np = 8
properties = beta,eightcores
ntype = cluster
jobs = 0/66482.hpchead.linux, 1/66482.hpchead.linux,chead.linux, 6/66482.hpchead.linux, 7/66482.hpchead.linux
status = opsys=linux,uname=Linux node17 2.6.27.19-5-s=1,idletime=6824915,totmem=24634928kb,availmem=20598068kbs=66482.hpchead.linux,varattr=,rectime=1333641113
node21
state = job-exclusive
np = 12
properties = blade
ntype = cluster
jobs = 0/66483.hpchead.linux, 1/66483.hpchead.linux,chead.linux, 6/66483.hpchead.linux, 7/66483.hpchead.linux.hpchead.linux
status = opsys=linux,uname=Linux node21 2.6.27.19-5-,nusers=1,idletime=8569176,totmem=26790348kb,availmem=203e=free,jobs=66483.hpchead.linux,varattr=,rectime=13336411
node22
state = job-exclusive
np = 12
properties = blade
ntype = cluster
jobs = 0/66475.hpchead.linux, 1/66475.hpchead.linux,chead.linux, 6/66475.hpchead.linux, 7/66475.hpchead.linux.hpchead.linux
status = opsys=linux,uname=Linux node22 2.6.27.19-5-users=1,idletime=8569178,totmem=26790348kb,availmem=21384free,jobs=66475.hpchead.linux,varattr=,rectime=1333641118
node23
state = job-exclusive
np = 12
properties = blade
ntype = cluster
jobs = 0/66484.hpchead.linux, 1/66484.hpchead.linux, 2/66484.hpchead.linux, 3/66484.hpchead.linux, 4/66484.hpchead.linux, 5/66484.hpchead.linux, 6/66484.hpchead.linux, 7/66484.hpchead.linux, 8/66484.hpchead.linux, 9/66484.hpchead.linux, 10/66484.hpchead.linux, 11/66484.hpchead.linux
status = opsys=linux,uname=Linux node23 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=10309 10370,nsessions=2,nusers=1,idletime=8569255,totmem=26790348kb,availmem=20165484kb,physmem=24685876kb,ncpus=12,loadave=12.01,gres=,netload=21742922098,state=free,jobs=66484.hpchead.linux,varattr=,rectime=1333641120
node24
state = job-exclusive
np = 12
properties = blade
ntype = cluster
jobs = 0/66485.hpchead.linux, 1/66485.hpchead.linux, 2/66485.hpchead.linux, 3/66485.hpchead.linux, 4/66485.hpchead.linux, 5/66485.hpchead.linux, 6/66485.hpchead.linux, 7/66485.hpchead.linux, 8/66485.hpchead.linux, 9/66485.hpchead.linux, 10/66485.hpchead.linux, 11/66485.hpchead.linux
status = opsys=linux,uname=Linux node24 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=11157 11218,nsessions=2,nusers=1,idletime=8569254,totmem=26790348kb,availmem=21489804kb,physmem=24685876kb,ncpus=12,loadave=12.05,gres=,netload=18486923435,state=free,jobs=66485.hpchead.linux,varattr=,rectime=1333641114
node25
state = job-exclusive
np = 12
properties = blade
ntype = cluster
jobs = 0/66469.hpchead.linux, 1/66469.hpchead.linux, 2/66469.hpchead.linux, 3/66469.hpchead.linux, 4/66469.hpchead.linux, 5/66469.hpchead.linux, 6/66469.hpchead.linux, 7/66469.hpchead.linux, 8/66469.hpchead.linux, 9/66469.hpchead.linux, 10/66469.hpchead.linux, 11/66469.hpchead.linux
status = opsys=linux,uname=Linux node25 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=6711 6772,nsessions=2,nusers=1,idletime=8569282,totmem=26790348kb,availmem=21082316kb,physmem=24685876kb,ncpus=12,loadave=12.00,gres=,netload=15199518313,state=free,jobs=66469.hpchead.linux,varattr=,rectime=1333641095
修改:
感谢所有回答的人。
答案 0 :(得分:4)
这应该返回正确的节点值
r'node\d+(?=[^\n]*\n\s*state\s*=\s*free)'
这使用积极的先行来窥视线的末尾,但不捕捉它找到的任何东西。它只匹配节点值。
l = re.findall(r'node\d+(?=[^\n]*\n\s*state\s*=\s*free)', s)
print l
>>> ['node13']
编辑:受到@hexparrot评论的启发,我意识到有一种更简单的方法。这个正则表达式r'node\d+(?=\s*state\s*=\s*free)'
更简单,也可以工作,即使它没有显式搜索换行符(因为\s
包含EOL字符)。但是......它也不能保证{<1}}可以在后续行上找到,如OP的要求中所述。它也会在同一行上匹配state=free
。因此,明确地搜索node99 state=free
更符合OP的要求。
答案 1 :(得分:3)
如果依赖于生成的文件是可靠构造的(例如,遵循与您所示相同的格式),正则表达式有时会比必要的要大一些。
因此,这是一种使用简单迭代的方法:
with open('yourfile.txt', 'r') as fp:
node_dict = {}
node = None
for line in fp:
if line[0:4] == 'node':
node = line.strip()
node_dict[node] = 0
elif "state" in line:
node_dict[node] = line.split('=')[1].strip()
print node_dict
返回
{'node13': 'free', 'node14': 'job-exclusive'}
然后很容易获得“免费”节点:
>>> print [k for k,v in node_dict.items() if v == 'free']
['node13']
答案 2 :(得分:2)
我建议先将文本解析为python结构,然后再操作该结构。正则表达式太复杂,太脆弱,无法完成这项工作。考虑:
doc = """
node13
state = free
np = 8
properties = beta,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node13 2.6.27.19-5-default etc
node14
state = job-exclusive
np = 8
properties = beta,eightcores
ntype = cluster
"""
data = {}
lastkey = None
for line in map(str.strip, doc.splitlines()):
if ' = ' in line and lastkey:
k, v = line.split(' = ', 1)
data[lastkey][k] = v
elif len(line):
lastkey = line
data[lastkey] = {}
这会创建一个这样的字典:
{'node13': {'np': '8',
'ntype': 'cluster',
'properties': 'beta,eightcores',
'state': 'free',
'status': 'opsys=linux,uname=Linux node13 2.6.27.19-5-default etc'},
'node14': {'np': '8',
'ntype': 'cluster',
'properties': 'beta,eightcores',
'state': 'job-exclusive'}}
你可以用普通的python方式操作:
free_nodes = [v for v in data.values() if v['state'] == 'free']
答案 3 :(得分:1)
您可以使用re.DOTALL标记,以便.
匹配包括换行符在内的所有内容。这是一个样本
>>> st="""
node13
state = free
np = 8
properties = beta,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node13 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=? 15201,nsessions=? 01,nusers=0,idletime=6837317,totmem=20506268kb,availmem=20259728kb,physmem=20506268kb,ncpus=8,loadave=0.00,gres=,netload=17130666575,se=free,jobs=,varattr=,rectime=1333639375
node14
state = job-exclusive
np = 8
properties = beta,eightcores
ntype = cluster
"""
>>> re.findall("(node\d+).*?state.*?free",st,re.DOTALL)
['node13']
请注意,这也可以在没有正则表达式的情况下完成
>>> stlines=st.splitlines()
>>> [stlines[i] for i in xrange(0,len(stlines)-1) if stlines[i+1].partition("=")[-1].strip() == 'free']
['node13']
>>>
请注意*** 如果你需要一个更强大的正则表达式,正如弗朗西斯在他的例子中所示,你可以使用下面的
>>> re.findall("(node\d+).*?state[ ]*=[ ]*free",st,re.DOTALL)
['node13']
>>>
答案 4 :(得分:1)
我同意@ thg435,正则表达式对于这项工作来说太强大了。我更喜欢一个非常简单的解决方案:
lines = data.split('\n')
num_lines = len(lines)
[lines[i] for i in range(numlines - 1) if 'state = free' in lines[i+1]]
这确实捕获了你想要做的事情的本质:如果下一行(lines[i+1]
)包含所需的文本,当前行(可能是节点的名称)将进入列表。
答案 5 :(得分:1)
向后看往往比向前看更容易。因此,当下一行包含某些内容时,不要考虑获取当前行;当当前行包含某些内容时,您希望获取上一个行。以这些术语表示,很容易构思和实施:
def find_free_node(doc):
prevline = ""
for line in doc.splitlines():
if line.strip() == "state = free" and previine.startswith("node"):
return prevline.strip()
prevline = line
另一种方法是跟踪您所在的节点而不是前一行。即使state = free
行没有紧跟节点名称行,也会有效。
def find_free_node(doc):
node = ""
for line in doc.splitlines():
if line.startswith("node"):
node = line.strip()
elif line.strip() = "state = free" and node:
return node
对我而言,这些比基于多线正则表达式的解决方案要清晰得多。