Question

我有一个非常大的.txt文件，我想写一个ruby脚本来过滤一些数据。基本上我想迭代每一行，然后将单个单词存储在数组中的行中，然后对单词进行操作。但是我无法在数组中单独获取每个单词

tracker_file.each_line do|line|
arr = "#{line}"

我可以得到这样的整行但是单个词怎么样？

由于

Answer 1

对字符串使用split方法。

irb(main):001:0> line = "one two three"
=> "one two three"
irb(main):002:0> line.split
=> ["one", "two", "three"]

所以你的例子是：

tracker_file.each_line do |line|
  arr = line.split
  # ... do stuff with arr
end

Answer 2

tracker_file.each_line do |line|
  line.scan(/[\w']+/) do |word|
    ...
  end
end

如果您不需要遍历行，则可以直接迭代单词：

tracker_file.read.scan(/[\w']+/) do |word|
    ...
end

Answer 3

你可以这样做：

tracker_file.each_line do |line|
    arr = line.split
# Then perform operations on the array
end

split方法会根据分隔符将字符串拆分为数组，在本例中为空格。

Answer 4

如果您正在阅读用英语写的内容，并且文本可能包含连字符，分号，空格，句号等，您可以考虑使用正则表达式，例如：

/[a-zA-Z]+(\-[a-zA-Z]+)*/

改为提取单词。

Answer 5

您不必使用IO#each_line，也可以使用IO#each(separator_string)

另一种选择是使用IO#gets：

while word = tracker_file.gets(/separator_regexp/)
  # use the word
end