Question

我正在寻找文本文件中的下一行数据。这是我正在使用的文件中的数据示例。

0519 ABF   244  AN        A1  ADV STUFF    1.0  2.0 Somestuff 018 0155  MTWTh      10:30A 11:30A    20     20     0  6.7                                                           
Somestuff 011 0145  MTWTh      12:30P  1:30P

我一直在尝试通过使用各种代码移至下一行，例如..回车\ n使用\ s +替换6.7之后的大空间。像这样使用m // //尚未找到结果。

这是一些示例代码

while !regex_file.eof?
line = regex_file.gets.chomp
if line =~ ^.*?\d{4}\s+[A-Z]+\s+\d{3}.+$
puts line
  end
end

使用https://rubular.com/这组特殊的代码与我期望的第一行输出相匹配

0519 ABF   244  AN        A1  ADV STUFF    1.0  2.0 Somestuff 018 0155  MTWTh      10:30A 11:30A    20     20     0  6.7

，但不匹配，还没有弄清楚如何匹配下一行。

Somestuff 011 0145  MTWTh      12:30P  1:30P

Answer 1

您当前的正则表达式：

^.*?\d{4}\s+[A-Z]+\s+\d{3}.+$

按以下顺序匹配：

行首（^）
零个或多个字符非贪婪.*?
四位数（\d{4}）
一个或多个空格（\s+）
一个或多个大写字母（[A-Z]+）
一个或多个空格
三位数（\d{3}）
一个或多个字符（.+）
行尾（$）

文件的第二行是：

Somestuff 011 0145  MTWTh      12:30P  1:30P

开始匹配0145 MTWT，然后不匹配\d{3}

Answer 2

尝试类似的操作：let dateString = "2015-08-31 21:36:00 +0000" let dateFormatter = DateFormatter() dateFormatter.dateFormat = "yyyy-MM-dd HH:mm:ss Z" if let date = dateFormatter.date(from: dateString) { // now you have your date object // to display UTC time you have to specify timeZOne UTC dateFormatter.timeZone = TimeZone(secondsFromGMT: 0) dateFormatter.dateFormat = "EEEE, MMMM dd, yyyy h:mm:ss a" let stringFromDate = dateFormatter.string(from: date) print(stringFromDate) // "Monday, August 31, 2015 9:36:00 PM" }捕获新行，并且您可以应用自己的规则来捕获\n之后出现的任何所需内容-请参见以下内容：

\n

Answer 3

我对匹配第二行的要求做了一个任意假设。它比正则表达式中反映的匹配第一个要求更为苛刻，但我认为额外的复杂性将对您具有一定的教育价值。

这是一个用于匹配两行的正则表达式（未经测试）。请注意，您不需要在正则表达式的开头使用^.*?，对于与第一行.+$匹配的正则表达式部分，不会添加任何内容，因此我将其删除。毕竟，您只需要分别匹配每一行（line），如果有匹配项，它将显示整行。同样，字符串结尾锚点\z比行尾锚点（$）更合适，尽管两者均可使用。

r = /
    (?:             # begin non-capture group   
      \d{4}         # match 4 digits
      \s+           # match > 0 whitespaces
      [A-Z]+        # match > 0 uppercase letters
      \s+           # match > 0 whitespaces
      \d{3}         # match 3 digits
    |               # or
      \b            # match a (zero-width) word break
      [A-Z]         # match 1 uppercase letter
      [a-z]*        # match >= 0 lowercase letter
      \s+           # match > 0 whitespaces
      \d{3}         # match 3 digits
      \s+           # match > 0 whitespaces
      \d{4}         # match 4 digits
      \s+           # match > 0 whitespaces
      [A-Za-z]+     # match > 0 letters
      (?:           # begin non-capture group
        \s+         # match > 0 whitespaces
        (?:         # begin a non-capture group
          0\d       # match 0 followed by any digit
        |           # or
          1[012]    # match 1 followed by 0, 1 or 2
        )           # end non-capture group
        :           # match a colon
        [0-5][0-9]  # match 0-5 followed by 0-9     
      ){2}          # end non-capture group and execute twice
    )               # end non-capture group  
    /x              # free-spacing regex definition mode

此正则表达式通常按如下方式编写。

r = /(?:\d{4}\s+[A-Z]+\s+\d{3}|\b[A-Z][a-z]*\s+\d{3}\s+\d{4}\s+[A-Za-z]+(?:\s+(?:0\d|1[012]):[0-5][0-9]){2})/

您可能会通过puts文件来查找匹配行，如下所示：

File.foreach(fname) { |line| puts line if line.match? r }

请参见IO::foreach，这是一种非常方便的逐行读取文件的方法。注意IO类方法（例如foreach）通常以File作为接收者来调用。没问题，因为File.superclass #=> IO，所以File从IO继承了这些方法。

在不带块foreach的情况下使用时，会返回一个枚举数，这通常也很方便。例如，如果您希望返回匹配行的数组（而不是puts行），则可以编写：

File.foreach(fname).with_object([]) do |line, arr|
  arr << line.chomp if line.match? r
end

在文件上使用正则表达式提取数据。多行问题

3 个答案: