字符串拆分与正则表达式

时间:2015-11-16 22:10:21

标签: ruby regex split

我需要有关正则表达式的帮助才能在我的日志行中拆分字符串。 记录消息如下:

  

Token1 | Token2 | Token3 | Token4 | Token5 | Token6 | Token7 | key1 = abc key2 = 89042683 keytransport = tcp keyUrl = POST / b / opt / HTTP / 1.1 :: ~~接受: / :: ~~ Content-Type:application / octet-stream :: ~~ Connection:Close :: ~~ User-Agent:Mozilla / 5.0(兼容; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident / 6.0) :: ~~主持人:something.su :: ~~内容 - 长度:185 :: ~~缓存控制:无缓存:: ~~ :: ~~ | \ 330 \ 302 \ 037 \ 262 \ 220 \ 333J ; \ 242. \ 031z0x \ 334 \ 177L keyType = web

鉴于以下内容:

message = "Token1|Token2|Token3|Token4|Token5|Token6|Token7|key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web"
hash = {}
hash['T1'], hash['T2'], hash['T3'], hash['T4'], hash['T5'], hash['T6'], hash['T7'], message = message.split /(?<!\\)[\|]/

它拆分 keyUrl字符串,在message中截断日志文件中的有效负载和后续键值,产生以下内容:

  

key1 = abc key2 = 89042683 keytransport = tcp keyUrl = POST / b / opt / HTTP / 1.1 :: ~~接受: / :: ~~ Content-Type:application / octet-stream :: ~~连接:关闭:: ~~ User-Agent:Mozilla / 5.0(兼容; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident / 6.0):: ~~主机:something.su :: ~~内容-Length:185 :: ~~ Cache-Control:no-cache :: ~~ :: ~~

我一直在尝试各种正则表达式的排列,但是很难过,并且想知道是否有人可以提供比message.split /(?<!\\)[\|]/更好的模式帮助。非常感谢。

已编辑我瞄准的结果是:

puts hash
{"T1"=>"Token1","T2"=>"Token2","T3"=>"Token3","T4"=>"Token4","T5"=>"Token5","T6"=>"Token6","T7"=>"Token7",}

puts message
key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web

希望这有助于澄清并再次感谢他们的协助尝试。

2 个答案:

答案 0 :(得分:1)

看起来你只是对|进行拆分。你可以这样做:...split("|")。您可以像这样收集剩余的位:

...,hash['T7'], *messages = message.split("|")
messages
=> ["key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~", "\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web"]

如果想要Token7 |之后的整个字符串你可以像这样加入他们:

message = messages.join("|")
=> "key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web"

编辑:现在如果你把它打印出来

puts message

你得到:

key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\330\302\037\262\220\333J;\242.\031z0x\334\177L keyType=web

答案 1 :(得分:0)

以下是您的示例文本:

text = "Token1|Token2|Token3|key1=abc key2=89042683 no-cache::~~::~~|\\330"

我理解您需要以下内容:

hash = {}
hash['T1'], hash['T2'], hash['T3'], message = text.split('|')
  #=> ["Token1", "Token2", "Token3", "key1=abc key2=89042683 no-cache::~~::~~"] 
hash
  #=> {"T1"=>"Token1", "T2"=>"Token2", "T3"=>"Token3"} 
message
  #=> "key1=abc key2=89042683 no-cache::~~::~~" 

如果我的假设不正确,请告诉我。

编辑:鉴于您的评论,它不仅仅是:

hash['T1'], hash['T2'], hash['T3'], message, keyurl = text.split('|')
  #=> ["Token1", "Token2", "Token3",
  #    "key1=abc key2=89042683 no-cache::~~::~~", "\\330"] 

hash['T1'], hash['T2'], hash['T3'], *messages = text.split('|')
  #=> ["Token1", "Token2", "Token3",
  #    "key1=abc key2=89042683 no-cache::~~::~~", "\\330"]
messages
  #=> ["key1=abc key2=89042683 no-cache::~~::~~", "\\330"]
你想要的吗?