Question

我想用awk过滤大文件中的行。

Basiclly我需要检查网址是否在特定的网址域中。例如example.com域中的http://example.com/test。

我想我需要通过＆＃34; //＆＃34;来切断网址字符串。和＆＃34;。＆＃34;然后比较网址字符串。

如何使用awk从网址获取域名并在不区分大小写的情况下对其进行比较？

Answer 1

这可能会做你想要的：

echo "http://example.com/test
https://foo.com/test/index.html" | awk '
{
    gsub(".*://","");

    gsub("[^.]+",""); # comment out if you want to leave subdomains 
    gsub("/.*$","");
    name=tolower($1);
    printf("name=%s : ",name);
    if(name ~ "example.com")
        printf("match !\n");
    else
        printf("Does not match !\n");
}'

要回答Johnathan Leffler的评论，这里有一个增强版本，可以删除子域名（如果有）并检测非限定名称：

echo "http://example.com/test
http://www.example.com/test
ftp://localhost/test
https://foo.com/test/index.html" | awk '
{
  gsub(".*://","",$1)
  gsub("/.*$","",$1)
  name=tolower($1)
  c=split(name,dc,".")
  if(c>=2)
    domain=dc[c-1]"."dc[c]
  else
    domain=""
  printf("name=%16s : ",name)
  printf("domain=%16s : ",domain)
  if(domain ~ "example.com")
    printf("match !\n")
  else
    printf("Does not match !\n")
}'

输出：

name=     example.com : domain=     example.com : match !
name= www.example.com : domain=     example.com : match !
name=       localhost : domain=                 : Does not match !
name=         foo.com : domain=         foo.com : Does not match !

如何使用awk来切断和比较字符串？

1 个答案: