Question

我有一个缓冲区

char buffer[size];

我用来存储流的文件内容（假设这里是pStream）

HRESULT hr = pStream->Read(buffer, size, &cbRead );

现在我将此流的所有内容都放在缓冲区中，其大小（此处假设大小）。现在我知道我有两个字符串

"<!doctortype html" and ".html>"

它存在于此缓冲区的存储内容中的某处（我们不是它们的位置），我想从该位置只存储缓冲区的内容

"<!doctortype html" to another string ".html>"

到另一个buffer2 [SizeWeDontKnow]了。

怎么做？（实际上这两个位置的内容是html文件的内容，我想存储此缓冲区中仅存在的html文件的内容）。任何想法怎么做??

Answer 1

您可以使用strnstr函数在缓冲区中找到正确的位置。找到起始标记和结束标记后，您可以使用strncpy在中间提取文本，或者在性能有问题的情况下使用它。
您可以根据代码的位置和第一个代码的长度来计算所需的尺寸nLength = nPosEnd - nPosStart - nStartTagLength

Answer 2

查找C / C ++的HTML解析器。

另一种方法是从缓冲区的开头有一个char指针，然后检查后面的每个char。看看它是否符合您的要求。

Answer 3

您是否仅限于C，或者您可以使用C ++吗？

在C库引用中有很多有用的方法来标记字符串和比较匹配（string.h）：

http://www.cplusplus.com/reference/cstring/

使用C ++我将执行以下操作（使用代码中的缓冲区和大小变量）：

    // copy char array to std::string
    std::string text(buffer, buffer + size);

    // define what we're looking for
    std::string begin_text("<!doctortype html");
    std::string end_text(".html>");

    // find the start and end of the text we need to extract
    size_t begin_pos = text.find(begin_text) + begin_text.length();
    size_t end_pos = text.find(end_text);

    // create a substring from the positions
    std::string extract = text.substr(begin_pos,end_pos);

    // test that we got the extract
    std::cout << extract << std::endl;

如果您需要C字符串兼容性，可以使用：

char* tmp =  extract.c_str();

Answer 4

如果这是对您的应用中的HTML代码进行操作的唯一操作，那么您可以使用我在下面提供的解决方案（您也可以在线测试它 - here）。但是，如果您要进行更复杂的解析，那么我建议使用一些外部库。

#include <iostream>
#include <cstdio>
#include <cstring>

using namespace std;

int main()
{
    const char* beforePrefix = "asdfasdfasdfasdf";
    const char* prefix = "<!doctortype html";
    const char* suffix = ".html>";
    const char* postSuffix = "asdasdasd";

    unsigned size = 1024;
    char buf[size];
    sprintf(buf, "%s%sTHE STRING YOU WANT TO GET%s%s", beforePrefix, prefix, suffix, postSuffix);

    cout << "Before: " << buf << endl;

    const char* firstOccurenceOfPrefixPtr = strstr(buf, prefix);
    const char* firstOccurenceOfSuffixPtr = strstr(buf, suffix);

    if (firstOccurenceOfPrefixPtr && firstOccurenceOfSuffixPtr)
    {
        unsigned textLen = (unsigned)(firstOccurenceOfSuffixPtr - firstOccurenceOfPrefixPtr - strlen(prefix));
        char newBuf[size];
        strncpy(newBuf, firstOccurenceOfPrefixPtr + strlen(prefix), textLen);
        newBuf[textLen] = 0;

        cout << "After: " << newBuf << endl;
    }

    return 0;
}

修改我现在明白了:)您应该使用strstr来查找prefix的第一次出现。我编辑了上面的代码，并更新了link。

如何从缓冲区中读取特定字符串

4 个答案: