Writing a C function using regular expression that can validate URL, IPv4 address, IPv6 address and FQDN

时间:2018-02-03 08:54:03

标签: c regex url ipv6 ipv4

While the below C function does a good job to validate any combination of URL/FQDN but it fails to validate IPv4 addresses and Shorthand notation of IPv6 and certain other IPv6 format addresses.

Can the below regex be improvised to validate IPv4 addresses and IPv6 addresses?

int validateURLPhase2(char *url)
{
    int    status;
    regex_t    re;

    char *regexp = "^((ftp|http|https)://)?([a-z0-9]([-a-z0-9]*[a-z0-9])?\\.)|([0-9].[0-9].[0-9].[0-9])|(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))+((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])$";

    if ( regcomp(&re, regexp, REG_EXTENDED|REG_NOSUB|REG_ICASE) != 0 )
    {
        printf( "Regex has invalidated FQDN 1\n");
        return -1;
    }
    status = regexec(&re, url, (size_t) 0, NULL, 0);
    regfree(&re);
    if ( status != 0 )
    {
        printf("Regex has invalidated FQDN 2\n");
        return -1;
    }
    return 0;
}

Valid URL format that ideally should be accepted but was failed: http://[2001::1]/abc Regex has invalidated FQDN 2 validation failed

Invalid URL format that ideally should be rejected but was success: http://10.192.1 validation success

Other cases passed: http://10.2.1.1/abc http://www.example.com/abc

1 个答案:

答案 0 :(得分:0)

The part of your regexp that matches numeric addresses only allows a single digit in each component. It also doesn't escape the ., so it's matching anything. It should be:

([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3})

Note that this will allow invalid IPs like 123.456.789.0. It just checks that each number is 1-3 digits, not that it's between 0 and 255.