Question

我正在抓取网站上的数据。这是我在Nokogiri解析Html时收到的字符串

"0:10\r\n              (+1)\r\n            "
"03:10\r\n              (+1)\r\n            "

我怎么才能得到“0:10”和“03:10”？

更新

match和gsub之间有什么不同？

谢谢！

Answer 1

您的正则表达式应仅匹配具有所需模式的字符串。

r = /
    \A                    # match beginning of string
    (                     # begin capture group 1
      \d+                 # match one or more digits
      :                   # match a colon
      \d{2}               # match two digits
    )                     # end capture group 1
    \r\n\s+\(\+1\)\r\n\s+ # match substring
    \z                    # match end of string
    /x                    # free spacing regex definition mode

"0:10\r\n              (+1)\r\n            "[r,1]
  #=> "0:10" 
"03:10\r\n              (+1)\r\n            "[r,1]
  #=> "03:10" 
"0:101\r\n              (+1)\r\n            "[r,1]
  #=> nil 
":10\r\n              (+1)\r\n            "[r,1]
  #=> nil 
"0:10 \r\n              (+1)\r\n            "[r,1]
  #=> nil 
"0:10\r\n              (+2)\r\n            "[r,1]
  #=> nil 
"0:10\r\n              (+1)\r\n         cat"[r,1]
  #=> nil

根据字符串的变化方式，您的模式可能需要进行一些更改。例如，如果＆＃34; + 1＆＃34;在括号中可能是＆＃34; +＆＃34;如果是正数，则需要将\(\+1\)替换为\(\+\d+\)。

Answer 2

你应该使用正则表达式/\d{0,2}:\d{0,2}/ @ engineer14发布。它有效，这是证明：

console.log("0:10\r\n              (+1)\r\n            ".match(/\d{0,2}:\d{0,2}/)[0])
console.log("03:10\r\n              (+1)\r\n            ".match(/\d{0,2}:\d{0,2}/)[0])

说明：

/ <-- open regex
\d <-- look for digit
{0,2} <-- zero or more of them
: <-- look for a colon
\d <-- look for another digit
{0,2} <-- zero or more of them
/ <-- close regex

Answer 3

你在哪个网站抓取？如果它是一个时区，那么+1可能很重要。

只在字符串ruby中获取时间

3 个答案: