我有一个字符串,它的样本如下:
有模式,通常在“序列:”之后,接下来的22个元素重复,这些是我想要隔离的数据。
所以我在思考,如果我将Sequence:
处的字符串拆分为元素列表,然后通过\n
将此生成的列表拆分为列表列表,每个列表都有一个长度22个元素将是我想要的数据。所以我用这段代码试过了:
proc = subprocess.Popen(cmd_rancli, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
//proc_stdout is the string
proc_stdout = proc.communicate(ran_opt_get_access_data)[0]
parse = proc_stdout.split('Sequence:')
print parse
time.sleep(5)
parse2 = [i.split('\n')[0] for i in parse]
print parse2
time.sleep(5)
然而,其中第二个并没有给我我的期望,我做错了什么?
实际输出:
parse2 = ['RAN> get ap 108352 attr=4192', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '
', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
分裂并返回一个空格?
这是第一次解析的一些结果:http://i.imgur.com/zhN3i3j.png
答案 0 :(得分:2)
使用您在pastebin上提供的字符串(作为变量a
的内容):
>>> result = [i.strip().split('\n') for i in a.split('Sequence')]
>>> [len(i) for i in result]
[10, 1, 3, 1, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 24, 3, 1, 23, 23, 23]
例如:
Sequence:
Value(int): 1
Value(string): 2013-02-26T15:01:11Z
Sequence:
所以让我们只过滤那些包含23个元素的元素(注意第一个元素是:
):
>>> result = [i[1:] for i in result if len(i) == 23]
>>> [len(i) for i in result]
[22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22]
现在你有了这样的数组:
>>> print( '\n'.join(result[0]))
Value(int): 10564
Value(int): 13
Value(int): 388
Value(int): 0
Value(int): -321
Value(int): 83
Value(string): 272
Value(string): 05
Value(int): 67
Value(int): 67
Value(int): 708
Value(int): 896
Value(int): 31
Value(int): 128
Value(int): -12
Value(int): -109
Value(int): 0
Value(int): -20
Value(int): -111
Value(int): -1
Value(int): -1
Value(int): 0
因此,您提供的数据所需的整个代码是:
proc_stdout = proc.communicate(ran_opt_get_access_data)[0].decode('utf-8')
result = [i.strip().split('\n') for i in proc_stdout.split('Sequence')]
result = [i[1:] for i in result if len(i) == 23]
# Or at least [i[1:] for i in result if len(i) > 1]
我们将使用简单的黑客攻击,因此始终只有一个:
和string.find()
以及string.strip()
来删除空格:
def filter_value(text):
index = text.find( ':')
# Not found :
if index < 0:
return text.strip()
return text[index+1:].strip()
通过替换此行来实现它:
result = [i[1:] for i in result if len(i) == 23]
这个单行:
result = [[filter_value(j) for j in i[1:]] for i in result if len(i) == 23]
答案 1 :(得分:1)
您没有得到预期的原因是每个Sequence:
和以下换行符之间都有空格。 <{1}}将在分割换行符后得到第一个项目,即空格。
我建议您不要修复这种方法,而是要做一些稍微复杂的事情并创建一个模拟输出的字典:
[i.split('\n')[0] for i in parse]
这会生成如下数据结构:
def add_data(key, value, data):
if key.startswith('Value('):
if key.endswith('(int)'):
value = int(value)
data['Sequences'][-1].append(value)
elif key == 'Sequence':
data['Sequences'].append([])
else:
data[key] = value
def parse_lines(lineseq):
data = {'Sequences':[]}
for line in lineseq:
try:
key, value = [part.strip() for part in line.split(':', 1)]
except ValueError:
continue
add_data(key, value, data)
return data
lines = proc_stdout.split('\n')
data = parse_lines(lines)
如果你只想要长度为22的序列,那很容易理解:
{'AttributeId': '4192',
'AttributeList': '',
'ClassId': '1014 (AP)',
'InstanceId': '0',
'MessageType': '81 (GetAttributesResponse)',
'ObjectInstance': '',
'Protocol': 'BSMIS Rx',
'RDN': '',
'TransactionId': '66',
'Sequences': [[],
[1,'2013-02-26T15:01:11Z'],
[],
[10564,13,388,0,-321,83,'272','05',67,67,708,896,31,128,-12,-109,0,-20,-111,-1,-1,0],
[10564,13,108,0,-11,83,'272','05',67,67,708,1796,31,128,-12,-109,0,-20,-111,-1,-1,0],
[10589,16,388,0,-15,79,'272','05',67,67,708,8680,31,125,-16,-110,0,-20,-111,-1,-1,0],
[10589,15,108,0,-16,81,'272','05',67,67,708,8105,31,126,-14,-109,0,-20,-111,-1,-1,0],
[10637,40,233,0,-11,89,'272','03',30052,1,5,54013,33,103,-6,-76,1,-20,-111,-1,-1,0],
[10662,46,234,0,-15,85,'272','03',30052,1,5,54016,33,97,-10,-74,1,-20,-111,-1,-1,0],
[10712,51,12,0,-24,91,'272','01',4013,254,200,2973,3,62,-4,-63,0,-20,-111,-1,-1,0],
[10737,15,224,0,-16,82,'272','01',3020,21,21,40770,33,128,-13,-108,0,-20,-111,-1,-1,0],
[10762,14,450,0,-7,78,'272','01',3020,21,21,53215,29,125,-17,-113,0,-20,-111,-1,-1,0],
[10762,15,224,0,-7,85,'272','01',3020,21,21,50770,33,128,-10,-105,0,-20,-111,-1,-1,0],
[10762,14,124,0,-7,78,'272','01',3020,10,10,56880,32,128,-17,-113,0,-20,-111,-1,-1,0],
[10812,11,135,0,-14,81,'272','02',36002,1,11,43159,31,130,-14,-113,1,-20,-111,-1,-1,0],
[10837,42,23,0,-9,89,'272','02',36002,1,11,53529,31,99,-6,-74,1,-20,-111,-1,-1,0,54],
[13,'2013-02-26T15:02:09Z'],
[],
[2,12,7,0,9,70,'272','02',20003,0,0,15535,0,0,0,0,1,100,100,-1,-1,0],
[5,15,44,0,-205,77,'272','02',20003,0,0,15632,0,0,0,0,1,100,100,-1,-1,0],
[7,25,9,0,0,84,'272','02',20002,0,0,50883,0,0,0,0,1,100,100,-1,-1,0]]
}