如何从此正文中隔离特定的字符串模式

时间:2013-07-17 20:54:01

标签: regex bash grep

我有以下文件: 2013-07-17_19-12-42.dcrec

如何在文件中搜索并隔离以下字符串模式:

客户端0的新名称,keyID = 000000,IP = 000.000.000.000:somename

客户端#可以是任何数字,keyid是任何数字值(客户端#或keyID没有设置长度),IP是任何普通的IPv4地址,somename是任何用户名(用户名可以包括特殊字符,如#,^,@,空格等)。看起来字符串是'封闭'的'^ Bvs'。以下是两个字符串的示例(参见屏幕截图)

example 1

example 2

每个文件中可以包含任意数量的这些字符串。如果我可以搜索并列出文件中这些字符串的所有实例,那将是很好的。我现在对grep等不太好,否则我就可以自己做了。任何帮助将不胜感激,谢谢!

2 个答案:

答案 0 :(得分:2)

$> strings 2013-07-17_19-12-42.dcrec | grep -o -P "New name for client [0-9]+, keyID = [0-9]+, IP = [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3} : [^\ ]+"                          
New name for client 7, keyID = 562830, IP = 91.193.208.105 : Sobieski
New name for client 8, keyID = 255344, IP = 63.153.210.124 : Cultist
New name for client 11, keyID = 5061431, IP = 116.240.255.94 : Sammy
New name for client 12, keyID = 5061453, IP = 196.20.195.114 : Dirk
New name for client 13, keyID = 4278381, IP = 188.110.185.183 : CSTO
New name for client 14, keyID = 369397, IP = 81.110.45.165 : General
New name for client 16, keyID = 5061651, IP = 85.4.29.162 : Thatsuseless
New name for client 17, keyID = 5061688, IP = 90.213.51.77 : NewPlayer
New name for client 18, keyID = 4905930, IP = 174.109.181.108 : Solo
New name for client 19, keyID = 5061695, IP = 85.4.236.70 : Quizzman
New name for client 21, keyID = 2745089, IP = 95.128.68.231 : NewPlayer
New name for client 22, keyID = 5061536, IP = 195.91.236.65 : POWERFUCKER
New name for client 24, keyID = 5061698, IP = 86.121.66.142 : TheDoctor
New name for client 26, keyID = 5061585, IP = 5.69.250.33 : Hydrogen

说明:

  • 如果输入文件是二进制格式,则strings可以处理它以检索所有文本;
  • 客户#可以是任意数字[0-9]+
  • keyid是任何数值[0-9]+
  • IP是任何普通的IPv4地址[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}
  • somename将是任何用户名(用户名可以包含特殊字符,如#,^,@,空格等)[^\ ]+表示“无空格”

第一次似乎足够了。是的,所有这些正则表达式都可以改进。

UPD:实际上,对于最后一个字段(somename).*正则表达式看起来更好

答案 1 :(得分:1)

grep --binary-files=text -o 'New name for client[^^B]*' 2013-07-17_19-12-42.dcrec

[^^B]*部分是常规[^,后跟 ctrl + v ,然后是 ctrl + b 和常规],表示任何不是^B控制字符的字符。

输出

New name for client 7, keyID = 562830, IP = 91.193.208.105 : Sobieski
New name for client 8, keyID = 255344, IP = 63.153.210.124 : Cultist O Khorne
New name for client 11, keyID = 5061431, IP = 116.240.255.94 : Sammy
New name for client 12, keyID = 5061453, IP = 196.20.195.114 : Dirk Diggler
New name for client 13, keyID = 4278381, IP = 188.110.185.183 : CSTO
New name for client 14, keyID = 369397, IP = 81.110.45.165 : General Ivan
New name for client 16, keyID = 5061651, IP = 85.4.29.162 : Thatsuseless
New name for client 17, keyID = 5061688, IP = 90.213.51.77 : NewPlayer
New name for client 17 (NewPlayer), keyID = 5061688, IP = 90.213.51.77 : MHT
New name for client 18, keyID = 4905930, IP = 174.109.181.108 : Solo Wing Pixy
New name for client 19, keyID = 5061695, IP = 85.4.236.70 : Quizzman
New name for client 21, keyID = 2745089, IP = 95.128.68.231 : NewPlayer
New name for client 18 (Solo Wing Pixy), keyID = 4905930, IP = 174.109.181.108 : Jane The Killer
New name for client 22, keyID = 5061536, IP = 195.91.236.65 : POWERFUCKER
New name for client 24, keyID = 5061698, IP = 86.121.66.142 : TheDoctor
New name for client 26, keyID = 5061585, IP = 5.69.250.33 : Hydrogen

如果您想过滤掉以下行:

New name for client 17 (NewPlayer), keyID = 5061688, IP = 90.213.51.77 : MHT
New name for client 18 (Solo Wing Pixy), keyID = 4905930, IP = 174.109.181.108 : Jane The Killer

使用上述变体:

grep --binary-files=text -o 'New name for client [0-9]\+,[^^B]*' \
   2013-07-17_19-12-42.dcrec