Question

我正在通过stdin读取大量数字，如果数字在[0,2 ^ 32]范围内（可能是2 ^ 32-1，我不确定），我需要找到它，因此它也不接受负数。在某些情况下，有一个以数百个空值开头的数字，我需要忽略它们。我确定我在使用“ long”作为数据类型时是否对数据类型做错了，因为我认为这总是最大2 ^ 32。所以如果有溢出，我得到一个负数，我可以证明long是否小于0。但是现在我意识到long的大小也取决于计算机的系统。

有没有人可以告诉我应该选择哪种数据类型和操作来证明这一点？起始数据类型只是一个char指针。

Answer 1

比一见钟情有些棘手。原因是当您允许任意长的输入序列时，甚至可能超过可用的最大数据类型，例如，甚至64位也可能会太少。

取决于使用哪种方法读取数字，是否检测到数据类型溢出。例如，如果处理后的整数值不适合无符号长整数，则scanf("%lu",...)可能导致不确定的行为（例如，与scanf有关的在线c11草稿）：

10 ...如果此对象没有适当的类型，或者转换结果无法在对象中表示，行为是不确定的。

因此，请勿将scanf用于任意输入。

与之相反，

函数strtoul具有定义的溢出行为（同样来自在线c11草案，涉及strtol）：

8）strtol，strtoll，strtoul和strtoull函数返回转换后的值（如果有）。如果无法执行转换，则零为回到。如果正确的值超出可表示的范围值，LONG_MIN，LONG_MAX，LLONG_MIN，LLONG_MAX，ULONG_MAX或返回ULLONG_MAX（根据返回类型和符号值，如果有的话），宏ERANGE的值存储在errno中。

您可以使用strtol，因为它将为您提供至少32位的数字，并且告诉您溢出。如果long是64位，则可以/需要区分一般溢出还是32位溢出。请参见以下代码对此进行说明：

#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>

void convert(const char* numStr) {

    errno=0;
    long num = strtol(numStr,NULL,10);

    if (errno == ERANGE){
        printf("numstr %s is out of long's range, which is %ld..%ld\n", numStr, LONG_MIN,LONG_MAX);
    } else if (num < 0) {
        printf("numstr %s is negative.\n", numStr);
    } else if (num > UINT32_MAX) {
        printf("numstr %s is out of 32 bit range, which is 0..%u\n", numStr, UINT32_MAX);
    } else {
        printf("OK; numstr %s is in 32 bit range, which is 0..%u\n", numStr, UINT32_MAX);
    }

}


int main() {

    convert("123456789012345678901234567890");
    convert("-123");
    convert("1234567890123567");
    convert("32452345");
    convert("0000000000000000000000032452345");
}

输出：

numstr 123456789012345678901234567890 is out of long's range, which is -9223372036854775808..9223372036854775807
numstr -123 is negative.
numstr 1234567890123567 is out of 32 bit range, which is 0..4294967295
OK; numstr 32452345 is in 32 bit range, which is 0..4294967295
OK; numstr 0000000000000000000000032452345 is in 32 bit range, which is 0..4294967295

Answer 2

这取决于您的具体情况。如果您的应用程序正在读取完全不正确的输入，那么您将需要对传入的文本进行一些预处理。此时，您可以检测到负号（并拒绝输入），并且可以去除前导零。

之后，您可以轻松地检查数字字符串的长度，如果长度超过10位（2 ^ 32 = 4294967296，其中有10位数字），则拒绝输入。如果它小于10位数字，则表示它在[0..2 ^ 32-1]范围内。

如果它完全是10位数字，那么您至少有几个选择：

将其解析为32位以上的整数（例如64位类型），然后直接检查其是否为< 2^32
逐个字符地读取它，并将其与2 ^ 32的数字进行比较。通过执行字符串cmp可以很容易地实现这一点。

您使用哪种类型取决于您的限制。

Answer 3

您可以在[0，2 ³² -1]（包括）范围内使用标准的strtoul()函数。

我建议检查指针strtoul()分配的指针（下面的ends）以检测类似"O"（字母O，不为零）和"1O"（字母1和字母O）的情况，而不是十）；也就是说，输入确实确实有一个十进制数字，并以适当的分隔符或字符串结尾标记结束。

例如：

#include <stdlib.h>
#include <inttypes.h>
#include <errno.h>

const char *next_u32(const char *from, uint32_t *to)
{
    const char    *ends = from;
    unsigned long  value;

    if (!from) {
        errno = EINVAL; /* NULL from pointer */
        return NULL;
    }

    /* Parse the number to 'value' */
    errno = 0;
    value = strtoul(from, (char **)ends, 10); /* 10 = decimal input */
    if (errno)
        return NULL;
    if (ends == from) {
        errno = EDOM; /* Not a valid number */
        return NULL;
    }

    /* Is it too large? */
    if (value > 4294967295uL) {
        errno = ERANGE; /* Out of valid range */
        return NULL;
    }

    /* Verify the separator is a space or end-of-string. */
    if (*ends != '\0' && *ends != '\t' && *ends != '\n' &&
        *ends != '\v' && *ends != '\f' && *ends != '\r' &&
        *ends != ' ') {
        errno = EDOM; /* The number was immediately followed by garbage. */
        return NULL;
    }

    /* Accepted. Save value, and return the pointer to the
       first unparsed character. */
    if (to)
        *to = (uint32_t)value;

    return ends;
}

要解析一行中的所有32位无符号整数，可以使用循环：

const char *line;  /* Contains the line with decimal integers to parse */
uint32_t    value;

while ((line = next_u32(line, &value))) {
    /* Use 'value' */
}

如果要解析大量的十进制无符号整数，则标准库的strtoul()不是最快的选择。

此外，有时仅对输入文件进行内存映射并让OS处理分页会更容易。但是，在那种情况下，末尾没有字符串末尾的NUL字节（\0），因此需要使用其他接口。

对于这些情况，可以使用以下功能。它获取并更新指向内容中当前位置的指针，但决不超过结束指针。如果输入的下一项是十进制的32位无符号整数，则返回零，否则返回非零错误代码。

#define  NEXT_OK    0
#define  NEXT_END   1
#define  NEXT_BAD  -1
#define  NEXT_OVER -2

static int next_u32(const char **fromptr, const char *end,
                    uint32_t *to)
{
    const char *from;
    uint32_t    val = 0;

    if (!fromptr)
        return NEXT_END;

    from = *fromptr;

    /* Skip whitespace characters, including NUL bytes. */
    while (from < end && (*from == '\0' || *from == '\t' ||
                          *from == '\n' || *from == '\v' ||
                          *from == '\f' || *from == '\r' ||
                          *from == ' '))
        from++;

    /* At end? */
    if (from >= end)
        return NEXT_END;

    /* Skip a leading + sign. */
    if (*from == '+' && end - from > 1)
        from++;

    /* Must be a decimal digit now. */
    if (!(*from >= '0' && *from <= '9'))
        return NEXT_BAD;

    /* Skip leading zeroes. */
    while (from < end && *from == '0')
        from++;

    /* Parse the rest of the decimal number, if any. */
    while (from < end && *from >= '0' && *from <= '9') {
        if ((value > 429496729) || (value == 429496729 && *from >= '6'))
            return NEXT_OVER;
        value = (10 * value) + (*(from++) - '0');
    }

    /* If not at end, check the character. */
    if (from < end && *from != '\0' && *from != '\t' &&
                      *from != '\n' && *from != '\v' &&
                      *from != '\f' && *from != '\r' &&
                      *from != ' ')
        return NEXT_BAD;

    /* Parsed correctly. */
    *fromptr = from;
    if (*to)
        *to = value;

    return NEXT_OK;
}

Answer 4

我建议使用int64_t以确保您的编号在每个体系结构上都相同。您将拥有2 ^ 32个“正数或空值”，并且小于零的任何值都是溢出或负数（只有真正的负数才会再次变为正数）。但是您可以通过将符号读为字符而不是数字int64_t来规避此问题，因此任何负数都将溢出（因为您之前捕获了'-'符号）

这是“ PSEUDO -代码”以说明该过程：

char sign = '\0'
uint64_t = 0
read (%c,&sign)
if (c=='-' or c=='+' or isdigit(c))

    if isdigit(c) 
                unget()
    read(%ull, &number) //read as unsigned as only negative are 2-complement
    if number < 0
                // the number is too big

    else
               //number is < 2^32       


else
   // number is "truly" negative

Answer 5

您无法使用32位数字检测32位数字的溢出。如果它是无符号的，它将始终被理解为0到2 ^ 32 -1之间的数字。输入溢出仍然会导致有效输出。一个有符号的32位值在0-2 ^ 31-1范围内将是“有效的”。负值将被视为无效。在-2 ^ 31和0或2 ^ 31和2 ^ 32-1之间的上/下流动输入将导致无效的负数。但是，您可能会发现2 ^ 32以上的数字将再次显示为有效。建议您使用带符号的64位数字，并将负数或大数视为无效输入。这为您提供了可以正确过滤的更大范围的输入值。例如，如果输入受到逗号限制并且省略了逗号，那么仍然可能存在问题。我建议将输入数字作为字符串不应超过限制长度。长度过滤器应允许表示大于2 ^ 32的数字的字符串，但应过滤出大于2 ^ 63的数字。在中间的某个地方。要测试类型的大小，请使用“ sizeof（）”。例如sizeof（long），sizeof（long long）等。但是，您的平台通常有明确大小的整数。为了实现可移植性，请使用您自己的类型和typedef，并将与平台相关的代码本地化为仅包含于平台相关性的包含文件。

查看数字是否小于2 ^ 32

5 个答案: