我有一组网址,例如:
https://www.facebook.com/profile.php?id=456789
https://www.facebook.com/messages/78134
https://www.facebook.com/profile.php?id=123
https://www.facebook.com/messages/781234
https://www.facebook.com/45/settings/781234/ab
https://www.facebook.com/48/settings/989213/ef
数据集至少有100个网址,比如5-6种类型。我的期望是:
[
['https://www.facebook.com/profile.php?id=456789',
'https://www.facebook.com/profile.php?id=123'],
['https://www.facebook.com/messages/781234',
'https://www.facebook.com/messages/78134'],
['https://www.facebook.com/45/settings/781234/ab',
'https://www.facebook.com/48/settings/989213/ef']
]
我如何对它们进行分类?没有学习输入。
答案 0 :(得分:1)
您的问题没有明确定义,但这似乎符合所需的输出:
require 'uri'
URL_DIVISIONS = %w[profile messages settings]
URL_DIVISION_REGEX = Regexp.union(URL_DIVISIONS)
urls = %w[
https://www.facebook.com/profile.php?id=456789
https://www.facebook.com/messages/78134
https://www.facebook.com/profile.php?id=123
https://www.facebook.com/messages/781234
https://www.facebook.com/45/settings/781234/ab
https://www.facebook.com/48/settings/989213/ef
]
pp urls.group_by{ |url|
URI.parse(url).path[URL_DIVISION_REGEX]
}
哪个输出:
{"profile"=>
["https://www.facebook.com/profile.php?id=456789",
"https://www.facebook.com/profile.php?id=123"],
"messages"=>
["https://www.facebook.com/messages/78134",
"https://www.facebook.com/messages/781234"],
"settings"=>
["https://www.facebook.com/45/settings/781234/ab",
"https://www.facebook.com/48/settings/989213/ef"]}
如果您需要没有分隔信息的列表,请使用:
pp urls.group_by{ |url|
URI.parse(url).path[URL_DIVISION_REGEX]
}.values
哪个输出:
[["https://www.facebook.com/profile.php?id=456789",
"https://www.facebook.com/profile.php?id=123"],
["https://www.facebook.com/messages/78134",
"https://www.facebook.com/messages/781234"],
["https://www.facebook.com/45/settings/781234/ab",
"https://www.facebook.com/48/settings/989213/ef"]]
我将它保留为散列,并使用URL_DIVISIONS数组循环键,根据需要提取值。
答案 1 :(得分:1)
这是一个自学版本。您没有指定学习的确切标准,因此您可能想要调整正则表达式,但也许您可以将其作为起点:
require 'uri'
urls = %w[
https://www.facebook.com/profile.php?id=456789
https://www.facebook.com/messages/78134
https://www.facebook.com/profile.php?id=123
https://www.facebook.com/messages/781234
https://www.facebook.com/45/settings/781234/ab
https://www.facebook.com/48/settings/989213/ef
]
pp urls.group_by { |url|
(URI.parse(url).path.match(/[a-z]+/) || ["unknown"])[0]
}
输出:
{"messages"=>
["https://www.facebook.com/messages/78134",
"https://www.facebook.com/messages/781234"],
"profile"=>
["https://www.facebook.com/profile.php?id=456789",
"https://www.facebook.com/profile.php?id=123"],
"settings"=>
["https://www.facebook.com/45/settings/781234/ab",
"https://www.facebook.com/48/settings/989213/ef"]}