我需要将Microsoft Excel中的URL修剪为根域和子域。
A1 =包含https://blog.example.com/page/
B1 =应该导致example.com
C1 =应该导致blog.example.com
删除http,https,.www和PATH的两个公式。第一个版本( B1 )也应删除SUBDOMAIN。
我现在只有一个公式:
=MID(SUBSTITUTE(A2;"www.";"");SEARCH(":";A2)+3;SEARCH("/";SUBSTITUTE(A2;"www.";"");9)-SEARCH(":";A2)-3)
https://example.com/page/page
会产生example.com
http://www.example.com/page/page
会产生example.com
http://blog.example.com/page/
会产生blog.example.com
example.com/page
会产生#VALUE!
www.example.com/page
会产生#VALUE!
正如您在上面的示例中看到的,我得到了很好的结果。但如果没有http或https,它就无法运行。此版本还保留子域名。
答案 0 :(得分:9)
在B1中尝试这个,
=SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(REPLACE(REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/", FIND("/", REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/"), LEN(A1), TEXT(,)), CHAR(46), REPT(CHAR(32), LEN(A1))), LEN(A1)*2)), CHAR(32), CHAR(46))
....这在C1中,
=SUBSTITUTE(REPLACE(REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/", FIND("/", REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/"), LEN(A1), TEXT(,)), "www.", TEXT(,))
答案 1 :(得分:2)
子域名 - 它的Jeeped's answer,但我已经添加了对空白单元格的支持,因为原始版本已输出" /":
=IF(ISBLANK(A1), "", SUBSTITUTE(REPLACE(REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/", FIND("/", REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/"), LEN(A1), TEXT(,)), "www.", TEXT(,)))
域 - 支持国际域名的版本from MrExcel(例如this.co.uk)。但与Jeeped的版本不同,它不支持1个字顶级域名,如 www.this.co 或 test.this.co - 有没有人知道如何解决这个问题?现在,我至少使用了一个帮助行,用于" www":
=IF(LEFT(a1,LEN("www."))="www.",RIGHT(a1,LEN(a1)-LEN("www.")), a1)
=SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(b1,FIND("://",b1)+3,99),b1))&"/","/",REPT(" ",99)),99))),".",REPT(" ",99)),99*(2+(LEN(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(b1,FIND("://",b1)+3,99),b1))&"/","/",REPT(" ",99)),99)))&".",".",REPT(" ",99)),198)))=2))))," ",".")
它起作用了:
A | B | C
(blank) | "" | ""
blog.test.com | blog.test.com | test.com
http://blog.test.com | blog.test.com | test.com
test.com | test.com | test.com
http://test.com | test.com | test.com
https://test.com | test.com | test.com
www.test.com | test.com | test.com
http://www.test.com | test.com | test.com
https://www.test.com | test.com | test.com
test.co.uk | test.co.uk | test.co.uk
http://test.co.uk | test.co.uk | test.co.uk
https://test.co.uk | test.co.uk | test.co.uk
www.test.co.uk | test.co.uk | test.co.uk
http://www.test.co.uk | test.co.uk | test.co.uk
https://www.test.co.uk | test.co.uk | test.co.uk
example.test.co.uk | example.test.co.uk | test.co.uk
http://example.test.co.uk | example.test.co.uk | test.co.uk
https://example.test.co.uk | example.test.co.uk | test.co.uk
example.com/test | example.com | example.com
http://example.com/test | example.com | example.com
https://example.com/test | example.com | example.com
http://blog.example.com/page/ | blog.example.com | example.com
example.com/page | example.com | example.com
www.example.com/page | example.com | example.com
答案 2 :(得分:1)
如果您的excel版本具有 FILTERXML 功能(可以在Excel 365, Excel 2019, Excel 2016, and Excel 2013
中找到),
假设您的网址在A2:A29
范围内
要找到子域,请在单元格B2
中输入以下公式并将其向下拖动:
=SUBSTITUTE(FILTERXML("<t><s>"&SUBSTITUTE(IFERROR(MID(A2,FIND("//",A2)+2,LEN(A2)),A2),"/","</s><s>")&"</s></t>","t/s[1]"),"www.","")
有关该公式背后的逻辑,您可以阅读以下文章:Extract Words with FILTERXML。
要找到根域,请在单元格C2
中输入以下公式并将其向下拖动:
=IF((SUMPRODUCT(--(MID(B2,ROW($1:$100),1)="."))-IF(SUMPRODUCT(--(MID(RIGHT(B2,8),ROW($1:$8),1)="."))=3,2,SUMPRODUCT(--(MID(RIGHT(B2,8),ROW($1:$8),1)="."))))>0,RIGHT(B2,LEN(B2)-FIND(".",B2)),B2)
我使用第一个公式中的Sub Domain来找到Root Domain。诀窍是找出第一个点
.
之前的URL的组成部分是根域还是子域,并采取相应的措施。
样本数据
| URL | Sub | Root |
|----------------------------------|---------------------|----------------|
| https://example.com/page/page | example.com | example.com |
| http://www.example.com/page/page | example.com | example.com |
| http://blog.example.com/page/ | blog.example.com | example.com |
| example.com/page | example.com | example.com |
| www.example.com/page | example.com | example.com |
| blog.test.com | blog.test.com | test.com |
| http://blog.test.com | blog.test.com | test.com |
| test.com | test.com | test.com |
| http://blog.test.uk.net/ | blog.test.uk.net | test.uk.net |
| https://test.cn | test.cn | test.cn |
| www.test.com | test.com | test.com |
| http://www.test.com | test.com | test.com |
| https://www.test.com | test.com | test.com |
| test.co.uk | test.co.uk | test.co.uk |
| https://test.co.uk | test.co.uk | test.co.uk |
| www.test.co.uk | test.co.uk | test.co.uk |
| http://www.test.co.uk | test.co.uk | test.co.uk |
| https://www.test.co.uk | test.co.uk | test.co.uk |
| blog.123.firm.in | blog.123.firm.in | 123.firm.in |
| http://example.test.co.uk | example.test.co.uk | test.co.uk |
| https://test.7.org.au | test.7.org.au | 7.org.au |
| test.example.org.nz/page | test.example.org.nz | example.org.nz |
| http://example.com/test | example.com | example.com |
| https://example.com/test | example.com | example.com |
| http://blog.example.com/page/ | blog.example.com | example.com |
| example.com/page | example.com | example.com |
| www.example.com/page | example.com | example.com |
| http://blog.1.co.uk | blog.1.co.uk | 1.co.uk |
答案 3 :(得分:0)
对于B1(提取根域),如果A1是完整的URL:
=SUBSTITUTE(SUBSTITUTE(REPLACE(A1,1,FIND(".",$A1),""),REPLACE(REPLACE(A1,1,FIND(".",$A1),""),1,FIND("/",REPLACE(A1,1,FIND(".",$A1),"")),""),""),"/","")