Question

我有工作c ++ deamon。问题是守护进程每月崩溃一到两次。正如您从GDB输出中看到的那样，当守护进程在unsigned int容器中搜索std::map <unsigned int const, SessionID*> sessionID时，会发生这种情况。我无法重现这个问题，并认为用户数据可能有问题（可能std::sting cookie_ssid有一些意想不到的数据，转换后strtoul出错了。（知道，这是获取unsigned int ）的正确方法

守护程序崩溃后我只有.core个文件。并在if (!_M_impl._M_key_compare(_S_key(__x), __k))看到该问题。任何想法如何解决这个问题？非常感谢。

GDB输出：

#0  std::_Rb_tree<unsigned int, std::pair<unsigned int const, SessionID*>, std::_Select1st<std::pair<unsigned int const, SessionID*> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, SessionID*> > >::find (this=0x529df15c, __k=@0x7f5fab7c)
    at stl_tree.h:1376
##########
1376            if (!_M_impl._M_key_compare(_S_key(__x), __k))
##########
#0  std::_Rb_tree<unsigned int, std::pair<unsigned int const, SessionID*>, std::_Select1st<std::pair<unsigned int const, SessionID*> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, SessionID*> > >::find (
    this=0x529df15c, __k=@0x7f5fab7c) at stl_tree.h:1376
#1  0x0805e6be in TR::find_session (this=0x529df110,
    cookie_ssid=@0x47ef3614, ptr_to_ptr_session=0x7f5fac7c)
    at stl_map.h:542

功能TR::find_session发布如下：

bool TR::find_session ( const std::string &cookie_ssid, SessionID **ptr_to_ptr_session )
{
    unsigned int uint_sessionid = std::strtoul ( cookie_ssid.c_str(),NULL,0);

MUTEX_map_sessionids.lock_reading();
    std::map<unsigned int, SessionID*>::iterator it_sessionids = map_sessionids.find( uint_sessionid );

    if ( it_sessionids != map_sessionids.end() )
    { // exists
        *ptr_to_ptr_session = it_sessionids->second;
        MUTEX_map_sessionids.unlock();
        return true;
    }

MUTEX_map_sessionids.unlock();  
return false;   
}

修改我的清理功能，在分离的线程（每分钟一次或5分钟）工作。按照评论的要求。我不确定这个功能。也许是它的马车......

void TR::cleanup_sessions () // not protected from multithread using! used only at one thread
{
std::list<SessionID*> list_to_clean; // tmplary store sessions to delete

MUTEX_map_sessionids.lock_reading();
std::map<unsigned int, SessionID*>::iterator it_sessionids = map_sessionids.begin();
MUTEX_map_sessionids.unlock();

while ( true )
{
    MUTEX_map_sessionids.lock_writing();
    if (it_sessionids == map_sessionids.end() )
    {
        MUTEX_map_sessionids.unlock();
        break;
    }

    SessionID *ptr_sessionid = it_sessionids->second;

    time_t secondsnow = time (NULL);

    ptr_sessionid->MUTEX_all_session.lock_reading();
    time_t lastaccesstime = ptr_sessionid->last_access_time;
    size_t total_showed = ptr_sessionid->map_showed.size(); 
    ptr_sessionid->MUTEX_all_session.unlock();


    if ( lastaccesstime and secondsnow - lastaccesstime > LOCALSESSION_LIFETIME_SEC ) // lifetime end!
    {
        // delete session from map
        map_sessionids.erase( it_sessionids++ ); // Increments the iterator but returns the original value for use by erase
        MUTEX_map_sessionids.unlock();              


        list_to_clean.push_back ( ptr_sessionid ); // at the end
    }
    else if ( total_showed == 0 and secondsnow - lastaccesstime > 36000 ) // not active for N secontes
    {
        map_sessionids.erase( it_sessionids++ ); // Increments the iterator but returns the original value for use by erase
        MUTEX_map_sessionids.unlock();

        // add pointer to list to delete it latter
        list_to_clean.push_back ( ptr_sessionid ); // at the end            
    }
    else
    {
        ++it_sessionids; // next
        MUTEX_map_sessionids.unlock();              
    }

}

// used? pause
if ( !list_to_clean.empty() ) 
{
    //sleep(1);
}

// cleanup session deleted from working map
while ( !list_to_clean.empty() )
{
    SessionID *ptr_sessionid_to_delete = list_to_clean.front();
    list_to_clean.pop_front();

    ptr_sessionid_to_delete->MUTEX_all_session.lock_writing(); // protected lock session mutex. can not delete session if its already locked. (additational protection)
    ptr_sessionid_to_delete->cleanup();
    delete ptr_sessionid_to_delete;
}

}

注意，因为您可以在每次尝试时看到我锁定/解锁map_sessions，因为此时其他线程会查找/插入新会话及其关键，因为用户不能等待。

Answer 1

请注意，对地图的任何修改都可能使该地图中的任何迭代器无效。你有：

MUTEX_map_sessionids.lock_reading();
std::map<unsigned int, SessionID*>::iterator it_sessionids = map_sessionids.begin();
MUTEX_map_sessionids.unlock();

现在，在解锁之后，其他一些线程可能会立即获取锁并执行一些使it_sessionids无效的操作，这会使后续代码损坏映射，从而导致以后崩溃。

您需要在迭代器的整个生命周期内获取AND HOLD锁。看起来您有读取器/写入器锁定，因此您只需要一直保持读取锁定，在需要修改映射时将其升级为写入锁定，然后在修改后立即将其降级为读取锁定。长时间保持读锁定只会阻止其他想要获取写锁的线程，而不是其他仅需要读锁的线程。

在评论中回答您的问题：

如果您无法长时间保持锁定，则无法让迭代器长时间保持有效。你可以做的一件事是偶尔记住你在地图中的位置，释放锁（给其他线程一个机会并使迭代器无效），然后重新获取锁并在大约相同的点创建一个新的迭代器。你可以在循环中间添加这样的东西：
```
if (++count > limit) { // only do this every Nth iteration
    unsigned int now_at = it_sessionids->first;
    MUTEX_map_sessionids.unlock();
    // give others a chance
    MUTEX_map_sessionids.lock_reading();
    it_sessionids = map_sessionids.lower_bound(now_at);
    count = 0; }
```
将锁从只读升级到读/写是读取器/写入器锁的基本操作，您的实现可能不支持。如果没有，那么你运气不好，需要在整个时间内保持作家锁。

std :: map std :: find的问题

1 个答案: