在C中,如何将HTML字符串转换为C字符串?

时间:2010-12-12 23:14:53

标签: html c

是否有常用的常规或库?

e.g。 '必须成为'

3 个答案:

答案 0 :(得分:1)

我尝试从字符串中解析出数字,然后使用atoi将其转换为数字,然后将其转换为字符。

这是我在约20秒内写的东西所以它是完全做作的:

  char html[] = "'";
  char* pch = &html[2];
  int n = 0;
  char c = 0;

  pch[2] = '\0';
  n = atoi(pch);
  c = n;

现在c是'。我也不知道html字符串......所以我可能会遗漏一些东西

答案 1 :(得分:1)

假设您关心&#xx;样式实体,这并不是特别难。简单,让所有人 - 其他 - 担心记忆管理,机械,真正的正则表达方式:

int hex_to_value(char hex) {
    if (hex >= '0' && hex <= '9') { return hex - '0'; }
    if (hex >= 'A' && hex <= 'F') { return hex - 'A' + 10; }
    if (hex >= 'a' && hex <= 'f') { return hex - 'f' + 10; }
    return -1;
}

void unescape(char* dst, const char* src) {
    // Write the translated version of the text at 'src', to 'dst'.
    // All sequences of '&#xx;', where x is a hex digit, are replaced
    // with the corresponding single byte.
    enum { NONE, AND, AND_HASH, AND_HASH_EX, AND_HASH_EX_EX } mode;
    char first_hex, second_hex, translated;
    mode m = NONE;
    while (*src) {
        char c = *src++;
        switch (m) {
            case NONE:
            if (c == '&') { m = AND; }
            else { *dst++ = c; m = NONE; }
            break;

            case AND:
            if (c == '#') { m = AND_HASH; }
            else { *dst++ = '&'; *dst++ = c; m = NONE; }
            break;

            case AND_HASH:
            translated = hex_to_value(c);
            if (translated != -1) { first_hex = c; m = AND_HASH_EX; }
            else { *dst++ = '&'; *dst++ = '#'; *dst++ = c; m = NONE; }
            break;

            case AND_HASH_EX:
            translated = hex_to_value(c);
            if (translated != -1) {
                second_hex = c;
                translated = hex_to_value(first_hex) << 4 | translated;
                m = AND_HASH_EX_EX;
            } else {
                *dst++ = '&'; *dst++ = '#'; *dst++ = first_hex; *dst++ = c;
                m = NONE;
            }
            break;

            case AND_HASH_EX_EX:
            if (c == ';') { *dst++ = translated; }
            else { 
                *dst++ = '&'; *dst++ = '#';
                *dst++ = first_hex; *dst++ = second_hex; *dst++ = c;
            }
            m = NONE;
            break;
        }
    }
}

乏味,而且代码比看起来更合理,但并不难:)

答案 2 :(得分:1)

有“GNU recode” - 命令行程序和库。 http://recode.progiciels-bpi.ca/index.html

除此之外,它还可以对HTML字符进行编码/解码。