我想在txt文件中搜索特定单词。如果我找到那个单词,我想检索文件中紧跟其后的单词。如果我的文本文件包含:
"My name is Jay and I want to go to the store"
我正在搜索单词"want"
,并希望将单词"to"
添加到我的数组中。我会查看一个非常大的文本文件,所以关于性能的任何注释都会很棒。
答案 0 :(得分:1)
最直观的阅读方式可能如下:
a = []
str = "My name is Jack and I want to go to the store"
str.scan(/\w+/).each_cons(2) {|x, y| a << y if x == 'to'}
a
#=> ["go", "the"]
要将文件读入字符串,请使用File.read
。
答案 1 :(得分:1)
这是一种方式:
<强>代码强>
def find_next(fname, word)
enum = IO.foreach(fname)
loop do
e = (enum.next).scan(/\w+/)
ndx = e.index(word)
if ndx
return e[ndx+1] if ndx < e.size-1
loop do
e = enum.next
break if e =~ /\w+/
end
return e[/\w+/]
end
end
nil
end
示例强>
text =<<_
It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
. . . . .
it was the epoch of belief, it was the epoch of incredulity,
it was the season of light, it was the season of darkness,
it was the spring of hope, it was the winter of despair…
_
FName = "two_cities"
File.write(FName, text)
find_next(FName, "worst")
# of
find_next(FName, "wisdom")
# it
find_next(FName, "foolishness")
# it
find_next(FName, "dispair")
#=> nil
find_next(FName, "magpie")
#=> nil
较短但效率较低,并且对大文件有问题:
File.read(FName)[/(?<=\b#{word}\b)\W+(\w+)/,1]
答案 2 :(得分:0)
这可能不是最快的方法,但这些方面应该有效:
filename = "/path/to/filename"
target_word = "weasel"
next_word = ""
File.open(filename).each_line do |line|
line.split.each_with_index do |word, index|
if word == target_word
next_word = line.split[index + 1]
end
end
end
答案 3 :(得分:0)
给定存储在文件中的文件,字符串或字符串:
pattern, match = 'want', nil
catch :found do
file.each_line do |line|
line.split.each_cons(2) do |words|
if words[0] == pattern
match = words.pop
throw :found
end
end
end
end
match
#=> "to"
请注意,此答案最多可以找到每个文件的一个匹配速度,而行式操作将节省内存。如果您想在每个文件中找到多个匹配项,或者在换行符中找到匹配项,那么this other answer可能就是您的选择。 YMMV。
答案 4 :(得分:0)
这是我能想到的最快的,假设你的文件是用字符串加载的:
word = 'want'
array = []
string.scan(/\b#{word}\b\s(\w+)/) do
array << $1
end
这将找到跟随您的特定单词的所有单词。例如:
word = 'want'
string = 'My name is Jay and I want to go and I want a candy'
array = []
string.scan(/\b#{word}\b\s(\w+)/) do
array << $1
end
p array #=> ["to", "a"]
在我的机器上测试这个,我将这个字符串复制了500,000次,我的执行时间达到了0.6秒。我也尝试过其他方法,例如拆分字符串等,但这是最快的解决方案:
require 'benchmark'
Benchmark.bm do |bm|
bm.report do
word = 'want'
string = 'My name is Jay and I want to go and I want a candy' * 500_000
array = []
string.scan(/\b#{word}\b\s(\w+)/) do
array << $1
end
end
end