Ruby regexp - 使用独特的正则表达式从不同的网址获取Facebook视频ID

时间:2018-03-05 15:44:14

标签: ruby-on-rails ruby regex

我想从可能不同的网址中提取视频ID

https://www.facebook.com/{page-name}/videos/{video-id}/
https://www.facebook.com/{username}/videos/{video-id}/
https://www.facebook.com/video.php?id={video-id}
https://www.facebook.com/video.php?v={video-id}

如何使用单个ruby正则表达式检索视频ID?

我还没有设法将它转换为Ruby正则表达式,但我(部分)设法用标准JS正则表达式编写它:

^(https?://www\.facebook\.com/(?:video\.php\?v=\d+|.*?/videos/\d+))$

当我在Ruby中运行以下代码时,它给出了一个错误:

text = "https://www.facebook.com/pili.morillo.56/videos/352355988613922/"
id = text.gsub( ^(https?://www\.facebook\.com/(?:video\.php\?v=\d+|.*?/videos/\d+))$ )

3 个答案:

答案 0 :(得分:1)

以下是我提出的正则表达式:/(?<=\/videos\/)\d+?(?=\/|$)|(?<=[?&]id=)\d+?(?=&|$)|(?<=[?&]v=)\d+?(?=&|$)/

打破这一局,我们可以得到这个:

(?<=\/videos\/)\d+(?=\/|$)|
(?<=[?&]id=)\d+(?=&|$)|
(?<=[?&]v=)\d+(?=&|$)

三个选项中的每一个都遵循以下简单结构:(?<=beforeMatch)target(?=afterMatch)。 以下是第一个例子:

(?<=\/videos\/) # Positive lookbehind
\d+             # Matching the digits
(?=\/|$)        # Positive lookahead

所以,这意味着,匹配\d+任何数字,只要它在\/videos\/之后,然后是\/,或者它的结尾是(?<=\/videos\/) # Match as long as preceeded by '\/videos\/' \d+ # Matching the id digits (?=\/|$) # As long as it's followed by '\/' or the EOL | # Or (?<=[?&]id=) # Match as long as preceeded by '?id' or '&id' \d+ # Matching the id digits (?=&|$) # As long as it's followed by either '&' or the EOL | # Or (?<=[?&]v=) # Match as long as preceeded by '?v' or '&v' \d+ # Matching the id digits (?=&|$) # As long as it's followed by either '&' or the EOL 线。

因此,我们可以匹配&#39; id =&#39;,&#39; v =&#39;或者&#39;视频/&#39;。

完整的解释:

([^,]+),?

EOL&#39;意味着行尾。

答案 1 :(得分:0)

RE = %r[https://www.facebook.com/(?:.+?/)?video(?:.*?[/=])(.+?)(?:/?\z)]
%w[
  https://www.facebook.com/{page-name}/videos/{video-id}/
  https://www.facebook.com/{username}/videos/{video-id}/
  https://www.facebook.com/video.php?id={video-id}
  https://www.facebook.com/video.php?v={video-id}
].map { |url| url[RE, 1] }
#⇒ ["{video-id}", "{video-id}", "{video-id}", "{video-id}"]

答案 2 :(得分:0)

您可以使用:

^https?:\/\/www\.facebook\.com\/.*?video(?:s|\.php.*?[?&](?:id|v)=)\/?([^\/&\n]+).*$

匹配

字符串的开头并开始url

^https?:\/\/www\.facebook\.com\/

其次是:

.*?          # Match any character zero or more times
video        # Match video
(?:          # Non capturing group
  s          # Match s
  |          # Or
  \.php      # Match .php
  .*?        # Match any character zero or more times         
  [?&]       # Match ? or &
  (?:id|v)=  # Match id or v in non capturing group followed by =
)            # Close non capturing group
\/?          # Match optional /
(            # Capturing group (group 1)
  [^\/&\n]+  # Match not / or & or newline
)            # Close capturing group
.*           # Match any character zero or more times
$            # End of the string
text = "https://www.facebook.com/pili.morillo.56/videos/352355988613922/"
id = text.gsub(/^https?:\/\/www\.facebook\.com\/.*?video(?:s|\.php.*?[?&](?:id|v)=)\/?([^\/&\n]+).*$/, "\\1")
puts id

这将导致:352355988613922

Demo