我需要有关正则表达式的帮助才能在我的日志行中拆分字符串。 记录消息如下:
Token1 | Token2 | Token3 | Token4 | Token5 | Token6 | Token7 | key1 = abc key2 = 89042683 keytransport = tcp keyUrl = POST / b / opt / HTTP / 1.1 :: ~~接受: / :: ~~ Content-Type:application / octet-stream :: ~~ Connection:Close :: ~~ User-Agent:Mozilla / 5.0(兼容; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident / 6.0) :: ~~主持人:something.su :: ~~内容 - 长度:185 :: ~~缓存控制:无缓存:: ~~ :: ~~ | \ 330 \ 302 \ 037 \ 262 \ 220 \ 333J ; \ 242. \ 031z0x \ 334 \ 177L keyType = web
鉴于以下内容:
message = "Token1|Token2|Token3|Token4|Token5|Token6|Token7|key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web"
hash = {}
hash['T1'], hash['T2'], hash['T3'], hash['T4'], hash['T5'], hash['T6'], hash['T7'], message = message.split /(?<!\\)[\|]/
它拆分 keyUrl字符串,在message
中截断日志文件中的有效负载和后续键值,产生以下内容:
key1 = abc key2 = 89042683 keytransport = tcp keyUrl = POST / b / opt / HTTP / 1.1 :: ~~接受: / :: ~~ Content-Type:application / octet-stream :: ~~连接:关闭:: ~~ User-Agent:Mozilla / 5.0(兼容; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident / 6.0):: ~~主机:something.su :: ~~内容-Length:185 :: ~~ Cache-Control:no-cache :: ~~ :: ~~
我一直在尝试各种正则表达式的排列,但是很难过,并且想知道是否有人可以提供比message.split /(?<!\\)[\|]/
更好的模式帮助。非常感谢。
已编辑我瞄准的结果是:
puts hash
{"T1"=>"Token1","T2"=>"Token2","T3"=>"Token3","T4"=>"Token4","T5"=>"Token5","T6"=>"Token6","T7"=>"Token7",}
puts message
key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web
希望这有助于澄清并再次感谢他们的协助尝试。
答案 0 :(得分:1)
看起来你只是对|进行拆分。你可以这样做:...split("|")
。您可以像这样收集剩余的位:
...,hash['T7'], *messages = message.split("|")
messages
=> ["key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~", "\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web"]
如果想要Token7 |之后的整个字符串你可以像这样加入他们:
message = messages.join("|")
=> "key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web"
编辑:现在如果你把它打印出来
puts message
你得到:
key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\330\302\037\262\220\333J;\242.\031z0x\334\177L keyType=web
答案 1 :(得分:0)
以下是您的示例文本:
text = "Token1|Token2|Token3|key1=abc key2=89042683 no-cache::~~::~~|\\330"
我理解您需要以下内容:
hash = {}
hash['T1'], hash['T2'], hash['T3'], message = text.split('|')
#=> ["Token1", "Token2", "Token3", "key1=abc key2=89042683 no-cache::~~::~~"]
hash
#=> {"T1"=>"Token1", "T2"=>"Token2", "T3"=>"Token3"}
message
#=> "key1=abc key2=89042683 no-cache::~~::~~"
如果我的假设不正确,请告诉我。
编辑:鉴于您的评论,它不仅仅是:
hash['T1'], hash['T2'], hash['T3'], message, keyurl = text.split('|')
#=> ["Token1", "Token2", "Token3",
# "key1=abc key2=89042683 no-cache::~~::~~", "\\330"]
或
hash['T1'], hash['T2'], hash['T3'], *messages = text.split('|')
#=> ["Token1", "Token2", "Token3",
# "key1=abc key2=89042683 no-cache::~~::~~", "\\330"]
messages
#=> ["key1=abc key2=89042683 no-cache::~~::~~", "\\330"]
你想要的吗?