如何优化C ++键值程序以实现更快的运行时间?

时间:2018-07-31 13:15:02

标签: c++ vector c++14 key-value

这是a2.hpp,据我所知代码是正确的,它是可以编辑的程序,太慢了。老实说,我在这里迷路了,我知道我的for循环可能会使我这么慢,也许是使用迭代器?

// <algorithm>, <list>, <vector>
// YOU CAN CHANGE/EDIT ANY CODE IN THIS FILE AS LONG AS SEMANTICS IS UNCHANGED

#include <algorithm>
#include <list>
#include <vector>

class key_value_sequences {

private:
  std::list<std::vector<int>> seq;
  std::vector<std::vector<int>> keyref;

public:
    // YOU SHOULD USE C++ CONTAINERS TO AVOID RAW POINTERS
    // IF YOU DECIDE TO USE POINTERS, MAKE SURE THAT YOU MANAGE MEMORY PROPERLY
    // IMPLEMENT ME: SHOULD RETURN SIZE OF A SEQUENCE FOR GIVEN KEY
    // IF NO SEQUENCE EXISTS FOR A GIVEN KEY RETURN 0
    int size(int key) const;

    // IMPLEMENT ME: SHOULD RETURN POINTER TO A SEQUENCE FOR GIVEN KEY
    // IF NO SEQUENCE EXISTS FOR A GIVEN KEY RETURN nullptr
    const int* data(int key) const;

    // IMPLEMENT ME: INSERT VALUE INTO A SEQUENCE IDENTIFIED BY GIVEN KEY
    void insert(int key, int value);
}; // class key_value_sequences


int key_value_sequences::size(int key) const {
    //checks if the key is invalid or the count vector is empty.
  if(key<0 || keyref[key].empty()) return 0;
    // sub tract 1 because the first element is the key to access the count
  return keyref[key].size() -1;
}

const int* key_value_sequences::data(int key) const {
      //checks if key index or ref vector is invalid
    if(key<0 || keyref.size() < static_cast<unsigned int>(key+1)) {
      return nullptr;
    }
      // ->at(1) accesses the count (skipping the key) with a pointer
    return &keyref[key].at(1);
}

void key_value_sequences::insert(int key, int value) {
      //checks if key is valid and if the count vector needs to be resized
    if(key>=0 && keyref.size() < static_cast<unsigned int>(key+1)) {
      keyref.resize(key+1);
      std::vector<int> val;
      seq.push_back(val);
      seq.back().push_back(key);
      seq.back().push_back(value);
      keyref[key] = seq.back();
    }
      //the index is already valid
    else if(key >=0) keyref[key].push_back(value);
}

#endif // A2_HPP

这是a2.cpp,这只是测试a2.hpp的功能,此代码无法更改

// DO NOT EDIT THIS FILE !!!
// YOUR CODE MUST BE CONTAINED IN a2.hpp ONLY

#include <iostream>
#include "a2.hpp"


int main(int argc, char* argv[]) {
    key_value_sequences A;

    {
        key_value_sequences T;
        // k will be our key
        for (int k = 0; k < 10; ++k) {  //the actual tests will have way more than 10 sequences.
            // v is our value
            // here we are creating 10 sequences:
            // key = 0, sequence = (0)
            // key = 1, sequence = (0 1)
            // key = 2, sequence = (0 1 2)
            // ...
            // key = 9, sequence = (0 1 2 3 4 5 6 7 8 9)
            for (int v = 0; v < k + 1; ++v) T.insert(k, v);
        }

        T = T;
        key_value_sequences V = T;
        A = V;
    }
    std::vector<int> ref;

    if (A.size(-1) != 0) {
        std::cout << "fail" << std::endl;
        return -1;
    }

    for (int k = 0; k < 10; ++k) {
        if (A.size(k) != k + 1) {
            std::cout << "fail";
            return -1;
        } else {
            ref.clear();
            for (int v = 0; v < k + 1; ++v) ref.push_back(v);
            if (!std::equal(ref.begin(), ref.end(), A.data(k))) {
                std::cout << "fail 3 " << A.data(k) << " " << ref[k];
                return -1;
            }
        }
    }

    std::cout << "pass" << std::endl;

    return 0;
} // main

如果有人能帮助我提高代码效率,我将非常感激,谢谢。

2 个答案:

答案 0 :(得分:1)

首先,我不认为您的代码是正确的。在插入中,如果它们的键有效,则创建一个新向量并将其插入序列。听起来是错误的,因为只有在您拥有新密钥的情况下,这种情况才会发生,但是如果您的测试通过,则可能会很好。

明智的表现:

  • 避免使用std :: list。链接列表在当今的硬件上具有可怕的性能,因为它们破坏了流水线,缓存和预取。始终使用std :: vector代替。如果有效载荷确实很大,并且您担心要使用副本,请使用std::vector<std::unique_ptr<T>>
  • 尝试避免复制载体。在您的代码中,您有keyref[key] = seq.back()复制了向量,但应该很好,因为它只是一个元素。

否则,没有明显的性能问题。尝试对程序进行基准测试和配置文件分析,看看最慢的部分在哪里。通常,您需要优化一两个地方以获得出色的性能。如果仍然太慢,请问另一个问题,在哪里发布结果,以便我们更好地理解问题。

答案 1 :(得分:0)

我将与Sorin一起说,如果可以避免,请不要使用std :: list。

因此,您将key用作直接索引,它在哪里表示非负值?它在哪里说小于1亿?

void key_value_sequences::insert(int key, int value) {
  //checks if key is valid and if the count vector needs to be resized
  if(key>=0 && keyref.size() < static_cast<unsigned int>(key+1)) {
    keyref.resize(key+1); // could be large
    std::vector<int> val; // don't need this temporary.
    seq.push_back(val); // seq is useless?
    seq.back().push_back(key);
    seq.back().push_back(value);
    keyref[key] = seq.back(); // we now have 100000000-1 empty indexes 
  }
  //the index is already valid
  else if(key >=0) keyref[key].push_back(value);
}

可以更快地完成吗?是的,这取决于您的key范围。您将需要实现flat_map或hash_map。

用于flat_map版本的C ++ 11概念代码。

// effectively a binary search
auto key_value_sequences::find_it(int key) { // type should be iterator
  return std::lower_bound(keyref.begin(), keyref.end(), [key](const auto& check){
    return check[0] < key; // key is 0-element
  });
}      


void key_value_sequences::insert(int key, int value) {
  auto found = find_it(key);

  // at the end or not found
  if (found == keyref.end() || found->front() != key) {
    found = keyref.emplace(found, key); // add entry
  }
  found->emplace_back(value); // update entry, whether new or old. 
}

const int* key_value_sequences::data(int key) const {
  //checks if key index or ref vector is invalid
  auto found = find_it(key);
  if (found == keyref.end())
    return nullptr;
  // ->at(1) accesses the count (skipping the key) with a pointer
  return found->at(1);
}

(希望我没错...)