有效地迭代大表

时间:2017-09-08 16:51:26

标签: c++ postgresql libpqxx

目前我在postgresql 9.6(debian stretch)上运行。我的桌子每天有大约24百万条记录。总共它包含大约7天的数据。这是一个临时数据存储,因为它无法实时处理。我处理完这些数据后会被删除。保持结果。

我的SQL是

// build sql string

m_sql = "select dt, adc0, adc1, adc2, adc3, id from " + to_string( table ) +
        " WHERE date(dt) = '" +
        boost::gregorian::to_iso_extended_string( day ) + "'" +
        " order by dt";

该表格如

                                    Tabelle »public.data«                                                                                                                                                                                   
 Spalte |             Typ             |                       Attribute                                                                                                                                                                     
--------+-----------------------------+-------------------------------------------------------                                                                                                                                              
 dt     | timestamp without time zone | not null                                                                                                                                                                                            
 adc0   | smallint                    |                                                                                                                                                                                                     
 adc1   | smallint                    |                                                                                                                                                                                                     
 adc2   | smallint                    |                                                                                                                                                                                                     
 adc3   | smallint                    |                                                                                                                                                                                                     
 id     | integer                     | not null Vorgabewert nextval('data_id_seq'::regclass)                                                                                                                                               
Indexe:                                                                                                                                                                                                                                     
    "data_pkey" PRIMARY KEY, btree (id)                                                                                                                                                                                                     
    "date_idx" btree (date(dt))                                                                                                                                                                                                             
    "dt_idx" btree (dt)                                                                                                                                                                                                                     
    "id_idx" btree (id)    

我目前的尝试是每步1百万批次检查它们并使用OFFSET / LIMIT并按时间顺序排列。现在,每次迭代再次搜索整个数据空间并对它们进行排序。直到达到偏移量。当我一次性获取数据时,我的进程会因为OOM杀死它而发生段错误。

这是在交易中。

// get the values
auto add_limit_offset = []( std::string sql,
                            size_t &offset ) -> std::string {
    static const size_t delta = 1000000;
    auto ret = sql + " LIMIT " + std::to_string( delta ) + " OFFSET " +
               std::to_string( offset ) + ";";
    offset += delta;
    return ret;

};

bool run = true;
size_t offset = 0;
int nr_of_sinks = static_cast<int>( m_next_sinks.size() );
while ( run )
{
    std::cout << "adc_source: at " << offset << std::endl;
    auto sql = add_limit_offset( m_sql, offset );
    auto dbvalues = T.exec( sql.c_str() );
    run = dbvalues.size() > 0;

    // fill dbvalues into stl container
    for ( pqxx::result::const_iterator d = dbvalues.begin();
          d != dbvalues.end(); ++d )
    {
        // timepoint
        std::string s_dt = ( *d )[0].c_str();
        s_dt.erase( s_dt.begin() + 19, s_dt.end() );
        auto dt = boost::posix_time::time_from_string( s_dt );

        for ( int adc = 0; adc < nr_of_sinks; ++adc )
        {
            std::string s_adc = ( *d )[1 + adc].c_str();
            double v = std::atoi( s_adc.c_str() );
            m_next_sinks[adc]->push( dt, v );
        }
    }
}

0 个答案:

没有答案