按空格分隔行,但保留反引号内的字符串

时间:2019-07-13 21:56:53

标签: python regex python-3.x lexical-analysis

我有以下格式化的数据:

function sayHelloAfterSomeTime (ms) {
  return new Promise((resolve, reject) => {
    if (typeof ms !== 'number') return reject('ms must be a number')
    setTimeout(() => { 
      console.log('Hello after '+ ms / 1000 + ' second(s)')
      resolve()  
    }, ms)
  })
}

async function awaitGo (ms) {
   await sayHelloAfterSomeTime(ms).catch(e => console.log(e))
   console.log('after awaiting for saying Hello, i can do another things  ...')
}

function notAwaitGo (ms) {
	sayHelloAfterSomeTime(ms).catch(e => console.log(e))
    console.log('i dont wait for saying Hello ...')
}

awaitGo(1000)
notAwaitGo(1000)
console.log('coucou i am event loop and i am not blocked ...')

我目前正在使用正则表达式和shlex的组合,但是如上所述,我遇到了问题

testing 25 `this is a test`
hello `world hello world`
log "log1" "log2" `third log`

这是我得到的输出:

import re, shlex

def tokenize(line):
    graveKeyPattern = re.compile(r'^ *(.*) (`.*`) *')
    if '`' in line:
        tokens = re.split(graveKeyPattern, line)
        tokens = tokens[1:3]
    else:
        tokens = shlex.split(line)
    #end if/else
    print(tokens)
    return tokens
#end tokenize

lines = []
lines.append('testing 25 `this is a test`')
lines.append('hello `world hello world`')
lines.append('log "log1" "log2" `third log`')
lines.append('testing2 "testing2 in quotes" 5')

for line in lines:
    tokenize(line)

这是我需要的输出:

['testing 25', '`this is a test`']
['hello', '`world hello world`']
['log "log1" "log2"', '`third log`']
['testing2', 'testing2', 'in', 'quotes', '5']

1 个答案:

答案 0 :(得分:1)

有时,匹配所需内容比拆分不需要的内容更容易。

这可以通过在反引号或非空格/引号之间进行匹配来对您的测试起作用:

lines = []
lines.append('testing 25 `this is a test`')
lines.append('`world hello world` hello ')
lines.append('log "log1" "log2" `third log` log3')

import re
[re.findall(r'((?:`.*?`)|[^\"\s]+)', s) for s in lines]

结果

[['testing', '25', '`this is a test`'],
 ['`world hello world`', 'hello'],
 ['log', 'log1', 'log2', '`third log`', 'log3']]
相关问题