目前我在postgresql 9.6(debian stretch)上运行。我的桌子每天有大约24百万条记录。总共它包含大约7天的数据。这是一个临时数据存储,因为它无法实时处理。我处理完这些数据后会被删除。保持结果。
我的SQL是
// build sql string
m_sql = "select dt, adc0, adc1, adc2, adc3, id from " + to_string( table ) +
" WHERE date(dt) = '" +
boost::gregorian::to_iso_extended_string( day ) + "'" +
" order by dt";
该表格如
Tabelle »public.data«
Spalte | Typ | Attribute
--------+-----------------------------+-------------------------------------------------------
dt | timestamp without time zone | not null
adc0 | smallint |
adc1 | smallint |
adc2 | smallint |
adc3 | smallint |
id | integer | not null Vorgabewert nextval('data_id_seq'::regclass)
Indexe:
"data_pkey" PRIMARY KEY, btree (id)
"date_idx" btree (date(dt))
"dt_idx" btree (dt)
"id_idx" btree (id)
我目前的尝试是每步1百万批次检查它们并使用OFFSET / LIMIT并按时间顺序排列。现在,每次迭代再次搜索整个数据空间并对它们进行排序。直到达到偏移量。当我一次性获取数据时,我的进程会因为OOM杀死它而发生段错误。
这是在交易中。
// get the values
auto add_limit_offset = []( std::string sql,
size_t &offset ) -> std::string {
static const size_t delta = 1000000;
auto ret = sql + " LIMIT " + std::to_string( delta ) + " OFFSET " +
std::to_string( offset ) + ";";
offset += delta;
return ret;
};
bool run = true;
size_t offset = 0;
int nr_of_sinks = static_cast<int>( m_next_sinks.size() );
while ( run )
{
std::cout << "adc_source: at " << offset << std::endl;
auto sql = add_limit_offset( m_sql, offset );
auto dbvalues = T.exec( sql.c_str() );
run = dbvalues.size() > 0;
// fill dbvalues into stl container
for ( pqxx::result::const_iterator d = dbvalues.begin();
d != dbvalues.end(); ++d )
{
// timepoint
std::string s_dt = ( *d )[0].c_str();
s_dt.erase( s_dt.begin() + 19, s_dt.end() );
auto dt = boost::posix_time::time_from_string( s_dt );
for ( int adc = 0; adc < nr_of_sinks; ++adc )
{
std::string s_adc = ( *d )[1 + adc].c_str();
double v = std::atoi( s_adc.c_str() );
m_next_sinks[adc]->push( dt, v );
}
}
}