Question

我正在尝试解析一个由socket检索的简单html。我已经使用C两个星期，仍然在熟悉语言，特别是内存控制的东西。

这是我制作的解析器，非常简单：

char *parser(const char *buf, int size, char end_token) {
    char *content = malloc((size_t) size);
    memset(content, 0, sizeof content);

    for (int i = 0; i < size; i++) {
        char next = *(buf + i);
        if (next == end_token) break;
        *(content + i) = next;
    }
    return content;
}

这就是我使用它的方式：每当我得到一个新页面时，我都会执行以下过程来获取URL：

while (1) {
    buf = strstr(buf, "<a href=");
    if (buf == NULL) {
        break;
    } else {
        buf = buf + strlen("<a href=");
        char *url = parser(buf, 100, '>');
        url = strstr(url, "9780") + strlen("9780");
        char *url_page = parser(url, 50, '\"');
    }
}

假设这是我要解析的内容（buf中的内容），反复解析它，几次之后，解析器的返回将会中断。破裂的原因是什么？我该怎么做才能防止这种破裂？

<html>
<body>
This is page ##
Other pages here include 
<a href="http://32.65.194.40:9780/100/20.html">This one</a>
and probably a few others</a>
</body>
</html>

我的网址返回：

1st round: "http://32.65.194.40:9780/100/20.html"
2nd round: "http://32.65.194.40:9780/100/20.html"
...
n round: "http://32.65.194.40:9780/100/20.html" IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0�

C中的简单HTTP解析器在几轮后停止工作

0 个答案: