如何在C ++中将iso 8601日期(可选毫秒)解析为struct tm?

时间:2014-11-12 19:55:50

标签: c++ parsing posix datetime-format

我有一个字符串,应该以ISO 8601格式指定一个日期和时间,其中可能有或没有毫秒,我希望从中获得struct tm以及任何格式可能已指定的毫秒值(如果字符串中不存在,则可以假定为零)。

检测字符串格式是否正确以及将用户指定的字符串转换为struct tm和毫秒值会涉及什么?

如果不是毫秒问题,我可能只是使用C函数strptime(),但我不知道当秒包含一个时,该函数的定义行为应该是什么小数点。

最后一点需要注意,如果可能的话,我会更喜欢一种解决方案,它不依赖于仅在Boost中找到的功能(但我很乐意接受C ++ 11作为先决条件)。

输入看起来像:

2014-11-12T19:12:14.505Z

2014-11-12T12:12:14.505-5:00
在这种情况下,

Z表示UTC,但可以使用任何时区,并且将表示为与GMT的+或 - 小时/分钟偏移。秒字段的小数部分是可选的,但它可能存在的事实是我不能简单地使用strptime()std::get_time()的原因,如果这样的字符没有描述任何特定的定义行为在字符串的秒部分中找到。

6 个答案:

答案 0 :(得分:18)

旧问题的新答案。理由:更新的工具。

使用此free, open source library,可以解析为std::chrono::time_point<system_clock, milliseconds>,其优势超过tm能够保持毫秒精度的优势。如果您真的需要,可以通过system_clock::to_time_t继续使用C API(在此过程中会丢失毫秒)。

#include "date.h"
#include <iostream>
#include <sstream>

date::sys_time<std::chrono::milliseconds>
parse8601(std::istream&& is)
{
    std::string save;
    is >> save;
    std::istringstream in{save};
    date::sys_time<std::chrono::milliseconds> tp;
    in >> date::parse("%FT%TZ", tp);
    if (in.fail())
    {
        in.clear();
        in.exceptions(std::ios::failbit);
        in.str(save);
        in >> date::parse("%FT%T%Ez", tp);
    }
    return tp;
}

int
main()
{
    using namespace date;
    using namespace std;
    cout << parse8601(istringstream{"2014-11-12T19:12:14.505Z"}) << '\n';
    cout << parse8601(istringstream{"2014-11-12T12:12:14.505-5:00"}) << '\n';
}

输出:

2014-11-12 19:12:14.505
2014-11-12 17:12:14.505

请注意,两个输出都是UTC。 parse使用-5:00偏移量将本地时间转换为UTC。如果您确实需要本地时间,还有一种方法可以解析为date::local_time<milliseconds>的类型,然后该类型将解析但忽略偏移量。如果需要,甚至可以将偏移量解析为chrono::minutes(使用parse过载minutes&

解析的精度由传入的chrono::time_point的精度控制,而不是由格式字符串中的标志控制。偏移量可以是+/-hhmm样式%z+/-[h]h:mm样式%Ez

答案 1 :(得分:12)

您可以使用C的{​​{1}}(http://www.cplusplus.com/reference/cstdio/sscanf/)来解析它:

sscanf

如果你有const char *dateStr = "2014-11-12T19:12:14.505Z"; int y,M,d,h,m; float s; sscanf(dateStr, "%d-%d-%dT%d:%d:%fZ", &y, &M, &d, &h, &m, &s); ,可以这样调用它(http://www.cplusplus.com/reference/string/string/c_str/):

std::string

如果它应该处理不同的时区你需要使用std::string dateStr = "2014-11-12T19:12:14.505Z"; sscanf(dateStr.c_str(), "%d-%d-%dT%d:%d:%fZ", &y, &M, &d, &h, &m, &s); 返回值 - 解析参数的数量:

sscanf

然后你可以填写int tzh = 0, tzm = 0; if (6 < sscanf(dateStr.c_str(), "%d-%d-%dT%d:%d:%f%d:%dZ", &y, &M, &d, &h, &m, &s, &tzh, &tzm)) { if (tzh < 0) { tzm = -tzm; // Fix the sign on minutes. } } http://www.cplusplus.com/reference/ctime/tm/)struct:

tm

也可以tm time; time.tm_year = y - 1900; // Year since 1900 time.tm_mon = M - 1; // 0-11 time.tm_mday = d; // 1-31 time.tm_hour = h; // 0-23 time.tm_min = m; // 0-59 time.tm_sec = (int)s; // 0-61 (0-60 in C++11) std::get_time)来完成C++11,因为@Barry在评论http://en.cppreference.com/w/cpp/io/manip/get_time中提到了

答案 2 :(得分:2)

  

现代C ++版本的解析ISO 8601功能

#include <cstdlib>
#include <ctime>
#include <string>

#ifdef _WIN32
#define timegm _mkgmtime
#endif

inline int ParseInt(const char* value)
{
    return std::strtol(value, nullptr, 10);
}

std::time_t ParseISO8601(const std::string& input)
{
    constexpr const size_t expectedLength = sizeof("1234-12-12T12:12:12Z") - 1;
    static_assert(expectedLength == 20, "Unexpected ISO 8601 date/time length");

    if (input.length() < expectedLength)
    {
        return 0;
    }

    std::tm time = { 0 };
    time.tm_year = ParseInt(&input[0]) - 1900;
    time.tm_mon = ParseInt(&input[5]) - 1;
    time.tm_mday = ParseInt(&input[8]);
    time.tm_hour = ParseInt(&input[11]);
    time.tm_min = ParseInt(&input[14]);
    time.tm_sec = ParseInt(&input[17]);
    time.tm_isdst = 0;
    const int millis = input.length() > 20 ? ParseInt(&input[20]) : 0;
    return timegm(&time) * 1000 + millis;
}

答案 3 :(得分:1)

旧问题,我有一些旧代码可以帮助您;)。我正在使用这里提到的日期库。虽然效果很好,但要付出一定的性能代价。对于最常见的情况,这并不是很重要。但是,例如,如果您有像我这样的服务来解析数据,那确实很重要。

我正在分析服务器应用程序以进行性能优化,发现使用日期库解析ISO时间戳比解析整个(约500字节)json文档要慢3倍。总的来说,时间戳大约占CPU总时间的4.8%。

在寻求优化这一部分的过程中,我没有发现很多C ++可以用于实际产品。而且我确实考虑过的代码大多具有某些依赖性(例如,CEPH中的ISO解析器看起来还不错,并且经过了很好的测试)。

最后,我转向了很好的旧C语言,并从SQLite date.c中剥离了一些代码以使其独立运行。区别:

日期:872ms

SQLite date.c:54毫秒

(现实生活服务应用程序的概要功能权重)

在这里(SQLite的全部学分):

头文件date_util.h

#include <stdint.h>
#include <stdbool.h>

#ifdef __cplusplus
extern "C" {
#endif

    // Calculates time since epoch including milliseconds
    uint64_t ParseTimeToEpochMillis(const char *str, bool *error);

    // Creates an ISO timestamp with milliseconds from epoch with millis.
    // The buffer size (resultLen) for result must be at least 100 bytes.
    void TimeFromEpochMillis(uint64_t epochMillis, char *result, int resultLen, bool *error);

#ifdef __cplusplus
}
#endif

这是C文件date_util.c:

#include "_date.h"
#include <ctype.h>
#include <stdio.h>
#include <stdarg.h>
#include <stdarg.h>
#include <assert.h>
#include <stdio.h>
#include <string.h>


/*
 ** A structure for holding a single date and time.
 */
typedef struct DateTime DateTime;
struct DateTime {
    int64_t iJD;        /* The julian day number times 86400000 */
    int Y, M, D;        /* Year, month, and day */
    int h, m;           /* Hour and minutes */
    int tz;             /* Timezone offset in minutes */
    double s;           /* Seconds */
    char validJD;       /* True (1) if iJD is valid */
    char rawS;          /* Raw numeric value stored in s */
    char validYMD;      /* True (1) if Y,M,D are valid */
    char validHMS;      /* True (1) if h,m,s are valid */
    char validTZ;       /* True (1) if tz is valid */
    char tzSet;         /* Timezone was set explicitly */
    char isError;       /* An overflow has occurred */
};

/*
 ** Convert zDate into one or more integers according to the conversion
 ** specifier zFormat.
 **
 ** zFormat[] contains 4 characters for each integer converted, except for
 ** the last integer which is specified by three characters.  The meaning
 ** of a four-character format specifiers ABCD is:
 **
 **    A:   number of digits to convert.  Always "2" or "4".
 **    B:   minimum value.  Always "0" or "1".
 **    C:   maximum value, decoded as:
 **           a:  12
 **           b:  14
 **           c:  24
 **           d:  31
 **           e:  59
 **           f:  9999
 **    D:   the separator character, or \000 to indicate this is the
 **         last number to convert.
 **
 ** Example:  To translate an ISO-8601 date YYYY-MM-DD, the format would
 ** be "40f-21a-20c".  The "40f-" indicates the 4-digit year followed by "-".
 ** The "21a-" indicates the 2-digit month followed by "-".  The "20c" indicates
 ** the 2-digit day which is the last integer in the set.
 **
 ** The function returns the number of successful conversions.
 */
static int GetDigits(const char *zDate, const char *zFormat, ...){
    /* The aMx[] array translates the 3rd character of each format
     ** spec into a max size:    a   b   c   d   e     f */
    static const uint16_t aMx[] = { 12, 14, 24, 31, 59, 9999 };
    va_list ap;
    int cnt = 0;
    char nextC;
    va_start(ap, zFormat);
    do{
        char N = zFormat[0] - '0';
        char min = zFormat[1] - '0';
        int val = 0;
        uint16_t max;

        assert( zFormat[2]>='a' && zFormat[2]<='f' );
        max = aMx[zFormat[2] - 'a'];
        nextC = zFormat[3];
        val = 0;
        while( N-- ){
            if( !isdigit(*zDate) ){
                goto end_getDigits;
            }
            val = val*10 + *zDate - '0';
            zDate++;
        }
        if( val<(int)min || val>(int)max || (nextC!=0 && nextC!=*zDate) ){
            goto end_getDigits;
        }
        *va_arg(ap,int*) = val;
        zDate++;
        cnt++;
        zFormat += 4;
    }while( nextC );
end_getDigits:
    va_end(ap);
    return cnt;
}

/*
 ** Parse a timezone extension on the end of a date-time.
 ** The extension is of the form:
 **
 **        (+/-)HH:MM
 **
 ** Or the "zulu" notation:
 **
 **        Z
 **
 ** If the parse is successful, write the number of minutes
 ** of change in p->tz and return 0.  If a parser error occurs,
 ** return non-zero.
 **
 ** A missing specifier is not considered an error.
 */
static int ParseTimezone(const char *zDate, DateTime *p){
    int sgn = 0;
    int nHr, nMn;
    int c;
    while( isspace(*zDate) ){ zDate++; }
    p->tz = 0;
    c = *zDate;
    if( c=='-' ){
        sgn = -1;
    }else if( c=='+' ){
        sgn = +1;
    }else if( c=='Z' || c=='z' ){
        zDate++;
        goto zulu_time;
    }else{
        return c!=0;
    }
    zDate++;
    if( GetDigits(zDate, "20b:20e", &nHr, &nMn)!=2 ){
        return 1;
    }
    zDate += 5;
    p->tz = sgn*(nMn + nHr*60);
zulu_time:
    while( isspace(*zDate) ){ zDate++; }
    p->tzSet = 1;
    return *zDate!=0;
}

/*
 ** Parse times of the form HH:MM or HH:MM:SS or HH:MM:SS.FFFF.
 ** The HH, MM, and SS must each be exactly 2 digits.  The
 ** fractional seconds FFFF can be one or more digits.
 **
 ** Return 1 if there is a parsing error and 0 on success.
 */
static int ParseHhMmSs(const char *zDate, DateTime *p){
    int h, m, s;
    double ms = 0.0;
    if( GetDigits(zDate, "20c:20e", &h, &m)!=2 ){
        return 1;
    }
    zDate += 5;
    if( *zDate==':' ){
        zDate++;
        if( GetDigits(zDate, "20e", &s)!=1 ){
            return 1;
        }
        zDate += 2;
        if( *zDate=='.' && isdigit(zDate[1]) ){
            double rScale = 1.0;
            zDate++;
            while( isdigit(*zDate) ){
                ms = ms*10.0 + *zDate - '0';
                rScale *= 10.0;
                zDate++;
            }
            ms /= rScale;
        }
    }else{
        s = 0;
    }
    p->validJD = 0;
    p->rawS = 0;
    p->validHMS = 1;
    p->h = h;
    p->m = m;
    p->s = s + ms;
    if( ParseTimezone(zDate, p) ) return 1;
    p->validTZ = (p->tz!=0)?1:0;
    return 0;
}

/*
 ** Put the DateTime object into its error state.
 */
static void DatetimeError(DateTime *p){
    memset(p, 0, sizeof(*p));
    p->isError = 1;
}

/*
 ** Convert from YYYY-MM-DD HH:MM:SS to julian day.  We always assume
 ** that the YYYY-MM-DD is according to the Gregorian calendar.
 **
 ** Reference:  Meeus page 61
 */
static void ComputeJD(DateTime *p){
    int Y, M, D, A, B, X1, X2;

    if( p->validJD ) return;
    if( p->validYMD ){
        Y = p->Y;
        M = p->M;
        D = p->D;
    }else{
        Y = 2000;  /* If no YMD specified, assume 2000-Jan-01 */
        M = 1;
        D = 1;
    }
    if( Y<-4713 || Y>9999 || p->rawS ){
        DatetimeError(p);
        return;
    }
    if( M<=2 ){
        Y--;
        M += 12;
    }
    A = Y/100;
    B = 2 - A + (A/4);
    X1 = 36525*(Y+4716)/100;
    X2 = 306001*(M+1)/10000;
    p->iJD = (int64_t)((X1 + X2 + D + B - 1524.5 ) * 86400000);
    p->validJD = 1;
    if( p->validHMS ){
        p->iJD += p->h*3600000 + p->m*60000 + (int64_t)(p->s*1000);
        if( p->validTZ ){
            p->iJD -= p->tz*60000;
            p->validYMD = 0;
            p->validHMS = 0;
            p->validTZ = 0;
        }
    }
}

/*
 ** Parse dates of the form
 **
 **     YYYY-MM-DD HH:MM:SS.FFF
 **     YYYY-MM-DD HH:MM:SS
 **     YYYY-MM-DD HH:MM
 **     YYYY-MM-DD
 **
 ** Write the result into the DateTime structure and return 0
 ** on success and 1 if the input string is not a well-formed
 ** date.
 */
static int ParseYyyyMmDd(const char *zDate, DateTime *p){
    int Y, M, D, neg;

    if( zDate[0]=='-' ){
        zDate++;
        neg = 1;
    }else{
        neg = 0;
    }
    if( GetDigits(zDate, "40f-21a-21d", &Y, &M, &D)!=3 ){
        return 1;
    }
    zDate += 10;
    while( isspace(*zDate) || 'T'==*(uint8_t*)zDate ){ zDate++; }
    if( ParseHhMmSs(zDate, p)==0 ){
        /* We got the time */
    }else if( *zDate==0 ){
        p->validHMS = 0;
    }else{
        return 1;
    }
    p->validJD = 0;
    p->validYMD = 1;
    p->Y = neg ? -Y : Y;
    p->M = M;
    p->D = D;
    if( p->validTZ ){
        ComputeJD(p);
    }
    return 0;
}

/* The julian day number for 9999-12-31 23:59:59.999 is 5373484.4999999.
 ** Multiplying this by 86400000 gives 464269060799999 as the maximum value
 ** for DateTime.iJD.
 **
 ** But some older compilers (ex: gcc 4.2.1 on older Macs) cannot deal with
 ** such a large integer literal, so we have to encode it.
 */
#define INT_464269060799999  ((((int64_t)0x1a640)<<32)|0x1072fdff)

/*
 ** Return TRUE if the given julian day number is within range.
 **
 ** The input is the JulianDay times 86400000.
 */
static int ValidJulianDay(int64_t iJD){
    return iJD>=0 && iJD<=INT_464269060799999;
}

/*
 ** Compute the Year, Month, and Day from the julian day number.
 */
static void ComputeYMD(DateTime *p){
    int Z, A, B, C, D, E, X1;
    if( p->validYMD ) return;
    if( !p->validJD ){
        p->Y = 2000;
        p->M = 1;
        p->D = 1;
    }else if( !ValidJulianDay(p->iJD) ){
        DatetimeError(p);
        return;
    }else{
        Z = (int)((p->iJD + 43200000)/86400000);
        A = (int)((Z - 1867216.25)/36524.25);
        A = Z + 1 + A - (A/4);
        B = A + 1524;
        C = (int)((B - 122.1)/365.25);
        D = (36525*(C&32767))/100;
        E = (int)((B-D)/30.6001);
        X1 = (int)(30.6001*E);
        p->D = B - D - X1;
        p->M = E<14 ? E-1 : E-13;
        p->Y = p->M>2 ? C - 4716 : C - 4715;
    }
    p->validYMD = 1;
}

/*
 ** Compute the Hour, Minute, and Seconds from the julian day number.
 */
static void ComputeHMS(DateTime *p){
    int s;
    if( p->validHMS ) return;
    ComputeJD(p);
    s = (int)((p->iJD + 43200000) % 86400000);
    p->s = s/1000.0;
    s = (int)p->s;
    p->s -= s;
    p->h = s/3600;
    s -= p->h*3600;
    p->m = s/60;
    p->s += s - p->m*60;
    p->rawS = 0;
    p->validHMS = 1;
}

/*
 ** Compute both YMD and HMS
 */
static void ComputeYMD_HMS(DateTime *p){
    ComputeYMD(p);
    ComputeHMS(p);
}

/*
 ** Input "r" is a numeric quantity which might be a julian day number,
 ** or the number of seconds since 1970.  If the value if r is within
 ** range of a julian day number, install it as such and set validJD.
 ** If the value is a valid unix timestamp, put it in p->s and set p->rawS.
 */
static void SetRawDateNumber(DateTime *p, double r){
    p->s = r;
    p->rawS = 1;
    if( r>=0.0 && r<5373484.5 ){
        p->iJD = (int64_t)(r*86400000.0 + 0.5);
        p->validJD = 1;
    }
}

/*
 ** Clear the YMD and HMS and the TZ
 */
static void ClearYMD_HMS_TZ(DateTime *p){
    p->validYMD = 0;
    p->validHMS = 0;
    p->validTZ = 0;
}

// modified methods to only calculate for and back between epoch and iso timestamp with millis

uint64_t ParseTimeToEpochMillis(const char *str, bool *error) {
    assert(str);
    assert(error);
    *error = false;
    DateTime dateTime;

    int res = ParseYyyyMmDd(str, &dateTime);
    if (res) {
        *error = true;
        return 0;
    }

    ComputeJD(&dateTime);
    ComputeYMD_HMS(&dateTime);

    // get fraction (millis of a full second): 24.355 => 355
    int millis = (dateTime.s - (int)(dateTime.s)) * 1000;
    uint64_t epoch = (int64_t)(dateTime.iJD/1000 - 21086676*(int64_t)10000) * 1000 + millis;

    return epoch;
}

void TimeFromEpochMillis(uint64_t epochMillis, char *result, int resultLen, bool *error) {
    assert(resultLen >= 100);
    assert(result);
    assert(error);

    int64_t seconds = epochMillis / 1000;
    int millis = epochMillis - seconds * 1000;
    DateTime x;

    *error = false;
    memset(&x, 0, sizeof(x));
    SetRawDateNumber(&x, seconds);

    /*
     **    unixepoch
     **
     ** Treat the current value of p->s as the number of
     ** seconds since 1970.  Convert to a real julian day number.
     */
    {
        double r = x.s*1000.0 + 210866760000000.0;
        if( r>=0.0 && r<464269060800000.0 ){
            ClearYMD_HMS_TZ(&x);
            x.iJD = (int64_t)r;
            x.validJD = 1;
            x.rawS = 0;
        }

        ComputeJD(&x);
        if( x.isError || !ValidJulianDay(x.iJD) ) {
            *error = true;
        }
    }

    ComputeYMD_HMS(&x);
    snprintf(result, resultLen, "%04d-%02d-%02dT%02d:%02d:%02d.%03dZ",
             x.Y, x.M, x.D, x.h, x.m, (int)(x.s), millis);
}

这两个帮助程序方法仅以毫秒为单位在时间戳之间进行转换。从DateTime设置tm结构应该很明显。

用法示例:

// Calculate milliseconds since epoch
std::string timeStamp = "2019-09-02T22:02:24.355Z";
bool error;
uint64_t time = ParseTimeToEpochMillis(timeStamp.c_str(), &error);

// Get ISO timestamp with milliseconds component from epoch in milliseconds.
// Multiple by 1000 in case you have a standard epoch in seconds)
uint64_t epochMillis = 1567461744355; // == "2019-09-02T22:02:24.355Z"
char result[100] = {0};
TimeFromEpochMillis(epochMillis, result, sizeof(result), &error);
std::string resultStr(result); // == "2019-09-02T22:02:24.355Z"

答案 4 :(得分:0)

当我首先转到sscanf()路径时,在将IDE切换到CLion之后,它建议使用std::strtol()函数来替换sscanf()

请记住,这只是获得与sscanf()版本相同结果的一个示例。它并不意味着在各方面都更短,更普遍和更正,而是指向每个人处于“纯C ++解决方案”的方向。它基于我从API收到的时间戳字符串,但尚未通用(我的情况需要处理YYYY-MM-DDTHH:mm:ss.sssZ格式),可以很容易地修改它以处理不同的字符串。

在发布代码之前,在使用std::strtol()之前需要做一件事:清理字符串本身,因此删除任何非数字标记(“ - ”,“:”,“T”, “Z”,“。”),因为没有它std::strtol()会以错误的方式解析数字(如果没有它,你最终会得到负数月或日值)。

这个小片段采用ISO-8601字符串(我需要的格式,如上所述)并将其转换为std::time_t结果,表示以毫秒为单位的纪元时间。从这里进入std::chrono-type对象非常容易。

std::time_t parseISO8601(const std::string &input)
{
    // prepare the data output placeholders
    struct std::tm time = {0};
    int millis;

    // string cleaning for strtol() - this could be made cleaner, but for the sake of the example itself...
    std::string cleanInput = input
        .replace(4, 1, 1, ' ')
        .replace(7, 1, 1, ' ')
        .replace(10, 1, 1, ' ')
        .replace(13, 1, 1, ' ')
        .replace(16, 1, 1, ' ')
        .replace(19, 1, 1, ' ');

    // pointers for std::strtol()
    const char* timestamp = cleanInput.c_str();
    // last parsing end position - it's where strtol finished parsing the last number found
    char* endPointer;
    // the casts aren't necessary, but I just wanted CLion to be quiet ;)
    // first parse - start with the timestamp string, give endPointer the position after the found number
    time.tm_year = (int) std::strtol(timestamp, &endPointer, 10) - 1900;
    // next parses - use endPointer instead of timestamp (skip the part, that's already parsed)
    time.tm_mon = (int) std::strtol(endPointer, &endPointer, 10) - 1;
    time.tm_mday = (int) std::strtol(endPointer, &endPointer, 10);
    time.tm_hour = (int) std::strtol(endPointer, &endPointer, 10);
    time.tm_min = (int) std::strtol(endPointer, &endPointer, 10);
    time.tm_sec = (int) std::strtol(endPointer, &endPointer, 10);
    millis = (int) std::strtol(endPointer, &endPointer, 10);

    // convert the tm struct into time_t and then from seconds to milliseconds
    return std::mktime(&time) * 1000 + millis;
}

不是最干净,最普遍的,但是在不使用像sscanf()这样的C风格函数的情况下完成工作。

答案 5 :(得分:-1)

Boost::DateTime库中有一个xml2from_iso_string

from_iso_extended_string