从C中的字符串中提取Web地址

时间:2016-02-28 07:03:35

标签: c parsing pointers

我的代码有问题,我需要你的帮助!我需要做的是编写一个函数,该函数将从输入的字符串中提取以www.开头并以.edu结尾的网址。输入的字符串中没有空格,因此scanf()在这里应该可以正常工作。

例如:
http://www.school.edu/admission。提取的地址应为www.school.edu

这是我到目前为止所提出的,它显然没有用,而且我不能想到其他任何事情。

void extract(char *s1, char *s2) {
    int size = 0;
    char *p, *j;

    p = s1; 
    j = s2;
    size = strlen(s1);

    for(p = s1; p < (s1 + size); p++) {
        if(*p == 'w' && *(p+1) == 'w' && *(p+2) == 'w' && *(p+3) == '.'){
            for(p; p < (p+4); p++)
                strcat(*j, *p);
        }
        else if(*p=='.' && *(p+1)=='e' && *(p+2)=='d' && *(p+3)=='u'){
            for(p; (p+1) < (p+4); p++)
                strcat(*j, *p);                    
        }   
    }
    size = strlen(j);
    *(j+size+1) = '\0';
}

该函数必须使用指针算法。我得到的错误与不兼容的类型和转换有关。谢谢你!

3 个答案:

答案 0 :(得分:1)

所以最简单的方法可能是:

#include <stdio.h>

int main(void)
{
    char str[1000];
    sscanf("http://www.school.edu/admission", "%*[^/]%*c%*c%[^/]", str);
    puts(str);
}

现在,这里是固定代码:

#include <stdio.h>
#include <string.h>

void extract(char *s1, char *s2) {
    size_t size = strlen(s1), i = 0;
    while(memcmp(s1 + i, "www.", 4)){
        i++;
    }
    while(memcmp(s1 + i, ".edu", 4)){
        *s2++ = *(s1 + i);
        i++;
    }
    *s2 = '\0';
    strcat(s2, ".edu");
}

int main(void)
{
    char str1[1000] = "http://www.school.edu/admission", str2[1000];
    extract(str1, str2);
    puts(str2);
}

请注意,s2必须足够大才能包含提取的网址,否则您可能会遇到段错误。

答案 1 :(得分:0)

这是解决您问题的简单方法:

char* extract(char *s1) {
 char* ptr_www;
 char* ptr_edu;
 int len ;
 char* s2;

 ptr_www = strstr(s1,"www");
 ptr_edu = strstr(s1,".edu");

 len = ptr_edu -ptr_www + 4;

 s2 = malloc (sizeof(char)*len+1);
 strncpy(s2,ptr_www,len);
 s2[len] = '\0';
 printf ("%s",s2);

 return s2;
}

答案 2 :(得分:-1)

遗憾的是,有很多错误。您的编译失败,因为您在期望char *时将char传递给strcat。即使它确实编译了,但它会崩溃。

for(p = s1; p < (s1 + size); p++) {
    // This if statement will reference beyond s1+size when p=s1+size-2. Consequently it may segfault
    if(*p=='w' && *(p+1)=='w' && *(p+2)=='w' && *(p+3)=='.') {
       for(p; p < (p+4); p++) // This is an infinite loop
           // strcat concatenates one string onto another.
           // Dereferencing the pointer makes no sense.
           // This is the likely causing your compilation error.
           // If this compiled it would almost certainly segfault.
           strcat(*j, *p);
    }
    // This will also reference beyond s1+size. Consequently it may segfault
    else if(*p=='.' && *(p+1)=='e' && *(p+2)=='d' && *(p+3)=='u') {
        for(p; (p+1) < (p+4); p++) // This is also an infinite loop
            // Again strcat expects 2x char* (aka. strings) not 2x char
            // This will also almost certainly segfault.
            strcat(*j, *p); 
    }
}

// strlen() counts the number of chars until the first '\0' occurrence
// It is never correct to call strlen() to determine where to add a '\0' string termination character.
// If the character were actually absent this would almost certainly result in a segfault.
// As it is strcat() (when called correctly) will add the terminator anyway.
size = strlen(j);
*(j+size+1) = '\0';

编辑:这看起来像是一个家庭作业问题,所以我认为提一下当前代码出错的地方会更具建设性,所以你可以重新检查你在这些方面的知识。

你确切问题的答案是它没有编译,因为你取消引用字符串,因此将2x c​​har而不是char *传递给strcat()。