Question

如何使用bash从网址中提取域名？喜欢：http://example.com/到example.com 必须适用于任何tld，而不仅仅是.com

Answer 1

您可以使用简单的AWK方式提取域名，如下所示：

echo http://example.com/index.php | awk -F[/:] '{print $4}'

输出：example.com

： - ）

Answer 2

basename "http://example.com"

现在当然，这不适用于这样的URI：http://www.example.com/index.html但您可以执行以下操作：

basename $(dirname "http://www.example.com/index.html")

或者更复杂的URI：

echo "http://www.example.com/somedir/someotherdir/index.html" | cut -d'/' -f3

-d表示“分隔符”，-f表示“字段”;在上面的示例中，由正斜杠'/'分隔的第三个字段是www.example.com。

Answer 3

$ URI="http://user:pw@example.com:80/"
$ echo $URI | sed -e "s/[^/]*\/\/\([^@]*@\)\?\([^:/]*\).*/\2/"
example.com

请参阅http://en.wikipedia.org/wiki/URI_scheme

Answer 4

echo $URL | cut -d'/' -f3 | cut -d':' -f1

适用于网址：

http://host.example.com
http://host.example.com/hi/there
http://host.example.com:2345/hi/there
http://host.example.com:2345

Answer 5

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];

if($url =~ /([^:]*:\/\/)?([^\/]+\.[^\/]+)/g) {
  print $2;
}

用法：

./test.pl 'https://example.com'
example.com

./test.pl 'https://www.example.com/'
www.example.com

./test.pl 'example.org/'
example.org

 ./test.pl 'example.org'
example.org

./test.pl 'example'  -> no output

如果你只是想要域而不是完整的主机+域，请改用：

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+\.[^\/]+)/g) {
  print $3;
}

Answer 6

sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_'

e.g。

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'https://example.com'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com:1234/some/path'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path#fragment'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path#fragment?params=true'
example.com

Answer 7

您可以使用python的urlparse：

而不是使用正则表达式来执行此操作

 URL=http://www.example.com

 python -c "from urlparse import urlparse
 url = urlparse('$URL')
 print url.netloc"

您可以像这样使用它，也可以将它放在一个小脚本中。但是，这仍然需要一个有效的方案标识符，查看您的注释，您的输入不一定提供。您可以指定默认方案，但urlparse期望netloc以'//'开头：

url = urlparse（'// www.example.com/index.html'，'http'）

所以你必须手动前置，即：

 python -c "from urlparse import urlparse
 if '$URL'.find('://') == -1 then:
   url = urlparse('//$URL','http')
 else:
   url = urlparse('$URL')
 print url.netloc"

Answer 8

关于如何获得这些网址的信息很少...请在下次显示更多信息。在网址等有参数...... 同时，只需对您的示例网址进行简单的字符串操作

例如

$ s="http://example.com/index.php"
$ echo ${s/%/*}  #get rid of last "/" onwards
http://example.com
$ s=${s/%\//}  
$ echo ${s/#http:\/\//} # get rid of http://
example.com

其他方式，使用sed（GNU）

$ echo $s | sed 's/http:\/\///;s|\/.*||'
example.com

使用awk

$ echo $s| awk '{gsub("http://|/.*","")}1'
example.com

Answer 9

以下将输出“example.com”：

URI="http://user@example.com/foo/bar/baz/?lala=foo" 
ruby -ruri -e "p URI.parse('$URI').host"

有关使用Ruby的URI类可以做些什么的更多信息，您必须咨询the docs。

Answer 10

一个涵盖更多案例的解决方案将基于sed regexp：

echo http://example.com/index.php | sed -e 's#^https://\|^http://##' -e 's#:.*##' -e 's#/.*##'

这适用于以下网址： http://example.com/index.php, http://example.com:4040/index.php, https://example.com/index.php

Answer 11

使用Ruby，您可以使用Domainatrix库/ gem

http://www.pauldix.net/2009/12/parse-domains-from-urls-easily-with-domainatrix.html

require 'rubygems'
require 'domainatrix'
s = 'http://www.champa.kku.ac.th/dir1/dir2/file?option1&option2'
url = Domainatrix.parse(s)
url.domain
=> "kku"

很棒的工具！： - ）

Answer 12

这里是node.js方式，它可以使用或不使用端口和深层路径：

//get-hostname.js
'use strict';

const url = require('url');
const parts = url.parse(process.argv[2]);

console.log(parts.hostname);

可以像：

一样调用

node get-hostname.js http://foo.example.com:8080/test/1/2/3.html
//foo.example.com

文档：https://nodejs.org/api/url.html

如何从网址中提取域名？

12 个答案: