用于提取搜索查询或特定字符串后的术语的脚本

时间:2014-08-15 23:31:25

标签: ruby regex linux

我试图提取我记录在日志文件中的搜索词。

我将每个搜索字词记录在日志文件中,例如

The search request for 'John' identified as ...

我想提取“John'从此日志文件中将其放入另一个文本文件中。

例如search_log.txt文件包含以下行:

The search request for 'John' identified as ...
The search request for 'Peter Parker' identified as ...
The search request for 'Iron man' identified as ...
The search request for 'Naruto Uzumaki' identified as ...
The search request for 'Chuck Norris' identified as ...

剧本应该提取约翰,彼得帕克,钢铁侠,鸣人鸣人,查克诺里斯,并将它们放入output.txt,每个学期一行。

或者是一个提取这些单词并将它们保存到数组中的ruby函数。

非常感谢

3 个答案:

答案 0 :(得分:2)

$ grep -o "search request for '[^']*'" input.txt | awk -F\' '{print $2}' > output.txt
$ cat output.txt
John
Peter Parker
Iron man
Naruto Uzumaki
Chuck Norris

首先 grep 查找字符串“search request for”的所有匹配项,后跟单引号中的man的名称,然后我们使用 awk 来清除字符串,这样我们只保留人的名字,每行一个。

grep -o的解决方案无论输入是全部在一行还是在几行上都有效。如果保证输入与OP的示例一样简单,那么我们可以采用更简单的一步解决方案,例如仅使用 awk

$ awk -F\' '{print $2}' input.txt 

但是,只有当输入在每一行上显示一个实例时,上述情况才有效,而如果名称中有单个引号,则不会起作用。每行接受几个“搜索请求”,还有:

$ awk -F\' '{for (i=2;i<=NF;i+=2) print $i}' input.txt

最后,如果输入真的非常简单,每行只有一个实例,我们可以选择 cut

$ cut -d\' -f2 input2.txt

所有这些答案都很容易找到,只需阅读手册页或查看其他类似的问题...... :(

答案 1 :(得分:0)

如果您正在寻找ruby解决方案,这会将search_log.txt中的每个名称打印到命令行:

File.open("search_log.txt", "r") do |f|
  puts f.read.scan(/'(.*)'/)
end

它打开并阅读filescanning表示用单引号括起来的字符串,并printing表示要控制的字符串。

这种方法返回一个包含提取字符串的数组:

def get_names(file)
  file.read.scan(/'(.*)'/).flatten!
end

names = File.open("search_log.txt", "r") do |f|
  get_names(f)
end

puts names.class
#=> Array

puts names
#=> John
#=> Peter Parker
#=> Iron man
#=> Naruto Uzumaki
#=> Chuck Norris

然后,您可以根据返回的数组生成output.txt文件:

File.open("blah.txt", "w+") do |f|
  names.each { |name| f.write "#{name}\n" }
end

答案 2 :(得分:0)

LOGFILE.LOG:

The search request for 'John' identified as ... 
Error: server crashed
The search request for 'Peter Parker' identified as ... 
The search request for 'Iron man' identified as ... 
The search request for 'Naruto Uzumaki' identified as ...
Error: 'DivisionByZeroError'
The search request for 'Chuck Norris' identified as ...
The search request for '' identified as ... will not be recorded
And if the search request for 'Abbey' is here, do not record name...

prog.rb:

infile = 'logfile.log'
outfile = 'logged_names.txt'

File.open(outfile, 'w') do |f|  #Open outfile for writing
  IO.foreach(infile) do |line|  #Open infile for reading and step through each line 
    md = line.match(/\AThe search request for '(.+?)'/)  #md => MatchData object or nil
    f.puts md[1] if md    #match() returns nil if there is no match; if there is a match, md[0] is whole match, md[1] is what matched the first parenthesized group in regex
  end   #infile automatically closed here
end   #outfile automatically closed here

...

~/ruby_programs$ ruby prog.rb 
~/ruby_programs$ cat logged_names.txt 
John
Peter Parker
Iron man
Naruto Uzumaki
Chuck Norris