Question

我正在编写程序的一部分，该程序解析并验证程序控制台参数中的一些用户输入。我选择使用stringstream用于此目的，但遇到无符号类型读取的问题。

下一个模板用于从给定字符串中读取请求的类型：

#include <iostream>
#include <sstream>
#include <string>

using std::string;
using std::stringstream;
using std::cout;
using std::endl;

template<typename ValueType>
ValueType read_value(string s)
{   
    stringstream ss(s);
    ValueType res;
    ss >> res;
    if (ss.fail() or not ss.eof())
        throw string("Bad argument: ") + s;
    return res;
}
// +template specializations for strings, etc. 

int main(void)
{   
    cout << read_value<unsigned int>("-10") << endl;
}

如果type是unsigned且输入字符串包含负数，我希望看到异常抛出（由ss.fail() = true引起）。但是stringstream会生成转换为无符号类型的值（在书面示例中为4294967286）。

如何修复此样本以实现所需的行为（最好不回退到c函数）？我知道它可以通过简单的第一个符号检查完成，但我可以放置前导空格。我可以编写自己的解析器，但不相信问题是如此不可预测，标准库无法解决它。

隐藏在无符号类型的字符串流运算符深处的函数是strtoull和strtoul。它们以描述的方式工作，但提到的函数是低级的。为什么stringstream不提供一些验证级别？（我只是希望我错了，但确实需要进行一些动作）。

Answer 1

版本免责声明：对于C ++ 03，答案是不同的。以下涉及C ++ 11。

首先，让我们分析一下发生了什么。

ss >> res;这会调用std::istream::operator>>(unsigned)。在[istream.formatted.arithmetic] / 1中，效果定义如下：

这些提取器表现为格式化的输入函数（如27.7.2.2.1中所述）。构造sentry对象后，转换就像执行以下代码片段一样：
typedef num_get< charT,istreambuf_iterator<charT,traits> > numget;
iostate err = iostate::goodbit;
use_facet< numget >(loc).get(*this, 0, *this, err, val);
setstate(err);
在上面的片段中，loc代表basic_ios类的私有成员。

在格式化输入函数到[istream :: sentry]之后，此处sentry对象的主要作用是消耗前导空白字符。如果出现错误（流处于失败/ eof状态），它还会阻止执行上面显示的代码。

使用的区域设置是"C"区域设置。理由：

对于通过stringstream构建的stringstream ss(s);，该iostream的语言环境是构建时的当前全局语言环境（保证在[ios.base.locales]的兔子洞深处] / 4）。由于OP的程序中没有更改全局区域设置，[locale.cons] / 2指定了“经典”区域设置，即"C"区域设置。

use_facet< numget >(loc).get使用[locale.num.get]中指定的成员函数num_get<char>::get(iter_type in, iter_type end, ios_base&, ios_base::iostate& err, unsigned int& v) const;（注意unsigned int，一切都还可以）。字符串的详细信息 - ＆gt; “C”语言环境的unsigned int转换很长，并在[facet.num.get.virtuals]中进行了描述。一些有趣的细节：

对于无符号整数值，使用函数strtoull。
如果转换失败，ios_base::failbit将分配给err。具体来说：“要存储的数值可以是以下值之一：[...]最负的可表示值或无符号整数类型的零，如果该字段表示的值太大而无法在val中表示。{{1已分配给ios_base::failbit。“

我们需要在第5段中找到err的C99,7.20.1.4：

如果主题序列以减号开头，则产生的值为转换被否定（在返回类型中）。

并根据第8段：

如果正确的值超出了可表示值的范围，strtoull，LONG_MIN，LONG_MAX，LLONG_MIN，LLONG_MAX或{{1}返回（根据返回类型和值的符号，如果有的话），并且宏ULONG_MAX的值存储在ULLONG_MAX
中

如果将负值视为ERANGE的有效输入，似乎在过去一直存在争议。在任何情况下，问题在于此功能。快速检查gcc说它被认为是有效的输入，因此也就是您观察到的行为。

历史记录：C ++ 03

C ++ 03在errno转换中使用strotoul。不幸的是，我还不太确定（{）是如何指定scanf的转换，以及在哪种情况下发生错误。

明确的错误检查：

我们可以手动插入该检查，方法是使用签名值进行转换和测试num_get，或者我们查找scanf字符（由于可能的本地化问题，这不是一个好主意）

Answer 2

支持明确检查签名的num_get方面。对于无符号类型，拒绝以'-'（在空格之后）开头的任何非零数字，并使用默认的C语言环境num_get进行实际转换。

#include <locale>
#include <istream>
#include <ios>
#include <algorithm>

template <class charT, class InputIterator = std::istreambuf_iterator<charT> >
class num_get_strictsignedness : public std::num_get <charT, InputIterator>
{
public:
    typedef charT char_type;
    typedef InputIterator iter_type;

    explicit num_get_strictsignedness(std::size_t refs = 0)
        : std::num_get<charT, InputIterator>(refs)
    {}
    ~num_get_strictsignedness()
    {}

private:
    #define DEFINE_DO_GET(TYPE) \
        virtual iter_type do_get(iter_type in, iter_type end,      \
            std::ios_base& str, std::ios_base::iostate& err,       \
            TYPE& val) const override                              \
        {  return do_get_templ(in, end, str, err, val);  }         // MACRO END

    DEFINE_DO_GET(unsigned short)
    DEFINE_DO_GET(unsigned int)
    DEFINE_DO_GET(unsigned long)
    DEFINE_DO_GET(unsigned long long)

    // not sure if a static locale::id is required..

    template <class T>
    iter_type do_get_templ(iter_type in, iter_type end, std::ios_base& str,
                           std::ios_base::iostate& err, T& val) const
    {
        using namespace std;

        if(in == end)
        {
            err |= ios_base::eofbit;
            return in;
        }

        // leading white spaces have already been discarded by the
        // formatted input function (via sentry's constructor)

        // (assuming that) the sign, if present, has to be the first character
        // for the formatting required by the locale used for conversion

        // use the "C" locale; could use any locale, e.g. as a data member

        // note: the signedness check isn't actually required
        //       (because we only overload the unsigned versions)
        bool do_check = false;
        if(std::is_unsigned<T>{} && *in == '-')
        {
            ++in;  // not required
            do_check = true;
        }

        in = use_facet< num_get<charT, InputIterator> >(locale::classic())
                 .get(in, end, str, err, val);

        if(do_check && 0 != val)
        {
            err |= ios_base::failbit;
            val = 0;
        }

        return in;
    }
};

用法示例：

#include <sstream>
#include <iostream>
int main()
{
    std::locale loc( std::locale::classic(),
                     new num_get_strictsignedness<char>() );
    std::stringstream ss("-10");
    ss.imbue(loc);
    unsigned int ui = 42;
    ss >> ui;
    std::cout << "ui = "<<ui << std::endl;
    if(ss)
    {
        std::cout << "extraction succeeded" << std::endl;
    }else
    {
        std::cout << "extraction failed" << std::endl;
    }
}

注意：

免费商店的分配不是必需的，您可以使用例如一个（静态）局部变量，用ctor

1

对于您要支持的每种字符类型（例如char，wchar_t，charXY_t），您需要添加自己的构面（可以是{{1}的不同实例化}模板）
num_get_strictsignedness

stringstream无符号输入验证

2 个答案: