基本上,我需要从URL或除www
之外的整个网站名称获取包含域名和子域名的行。
我的数据库表如下所示:
+----------+------------------------+
| id | website |
+----------+------------------------+
| 1 | https://www.google.com |
+----------+------------------------+
| 2 | http://www.google.co.in|
+----------+------------------------+
| 3 | www.google.com |
+----------+------------------------+
| 4 | www.google.co.in |
+----------+------------------------+
| 5 | google.com |
+----------+------------------------+
| 6 | google.co.in |
+----------+------------------------+
| 7 | http://google.co.in |
+----------+------------------------+
预期产出:
google.com
google.co.in
google.com
google.co.in
google.com
google.co.in
google.co.in
我的Postgres查询如下所示:
select id, substring(website from '.*://([^/]*)') as website_domain from contacts
但上面的查询给出了空白网站。那么,我如何才能获得所需的输出?
答案 0 :(得分:5)
你必须使用" non capture"匹配?:应对非" http://"网站
喜欢
select
id,
substring(website from '(?:.*://)?(?:www\.)?([^/]*)')
as website_domain
from contacts
http://sqlfiddle.com/#!17/197fb/14
https://www.postgresql.org/docs/9.3/static/functions-matching.html#POSIX-ATOMS-TABLE
答案 1 :(得分:3)
您可以使用
SELECT REGEXP_REPLACE(website, '^(https?://)?(www\.)?', '') from tbl;
请参阅regex demo。
<强>详情
^
- 字符串开头(https?://)?
- 1次或0次http://
或https://
(www\.)?
- www.
请参阅PostgreSQL demo:
CREATE TABLE tb1
(website character varying)
;
INSERT INTO tb1
(website)
VALUES
('https://www.google.com'),
('http://www.google.co.in'),
('www.google.com'),
('www.google.co.in'),
('google.com'),
('google.co.in'),
('http://google.co.in')
;
SELECT REGEXP_REPLACE(website, '^(https?://)?(www\.)?', '') from tb1;
结果: