在C ++中查找唯一的字符串,并生成相关的查找向量

时间:2012-02-10 16:38:50

标签: c++ string vector unique

A在c ++中有一个字符串向量:

vector<string> myVect = {"A", "A", "A", "B", "B", "A", "C", "C", "foo", "A", "foo"};

如何将其转换为整数向量,以便每个整数唯一对应myVect中的字符串? 即我想要一个载体

out = {0, 0, 0, 1, 1, 0, 2, 2, 3, 0, 3}

此外,我想要一个唯一字符串的向量,每个位置对应out中的数字:

uniqueStrings = {"A", "B", "C", "foo"}

到目前为止,我有以下内容:

  vector<string> uniqueStrings;   // stores list of all unique strings
  vector<int> out(myVect.size());

  for (int i = 0; i < myVect.size(); ++i)
  {

    // seeing if this string has been encountered before
    bool assigned = false;
    for (int j = 0; j < uniqueStrings.size(); ++j)
      if (!myVect.at(i).compare( uniqueStrings.at(j) ))
      {
        out.at(i) = j;
        assigned = true;
        break;
      }

    // if not, add new example to uniqueStrings
    if (!assigned)
    {
      uniqueStrings.push_back(myVect.at(i));
      out.at(i) = uniqueStrings.size();
    }

  }

这有效,但肯定有更好的方法吗?

3 个答案:

答案 0 :(得分:2)

继续将它们推入地图中,其中字符串是键,值对应于每个字符串的id。然后,地图的值将唯一对应于字符串,而键将是唯一的字符串。

答案 1 :(得分:2)

使用set

# include <set>
...
set <string> uniqueStrings;
...
for (int i = 0; i < myVect.size(); ++i)
{
    uniqueStrings.insert(myVect[i]);
}

答案 2 :(得分:1)

以下是一个或多或少的完整示例,说明如何使用std::map<>维护唯一字符串到整数ID的映射:

#include <algorithm>
#include <iostream>
#include <map>
#include <string>
#include <vector>

using namespace std;


// a simple functor type that makes it easier to dump the contents of a 
//  container of simple values or a container of std::pair
struct dump
{
    template <typename K, typename V>
    void operator()( typename std::pair<K,V> const& x)
    {
        cout << x.first << " ==> " << x.second << endl;
    }

    template <typename T>
    void operator()( T const& x)
    {
        cout << x << endl;
    }
};



#define NUM_ELEM(x) (sizeof(x)/sizeof(x[0]))

char const* data[] = {"A", "A", "A", "B", "B", "A", "C", "C", "foo", "A", "foo"};

int main() {
    // intialize the data set
    vector<string> myVect( data, data + NUM_ELEM(data));

    cout << "dump of initial data set" << endl << endl;
    for_each( myVect.begin(), myVect.end(), dump());

    map<string,size_t> uniqueStrings;   // stores collection of all unique strings

    for (vector<string>::iterator i = myVect.begin(); i != myVect.end(); ++i) {
        // I'm using uniqueStrings.size() as a convenience here...
        // I just needed something to generate  unique ID's easily,
        // it might not be appropriate to use size() for your ID's in real life

        // this will insert the new mapping if there's not already one 
        uniqueStrings.insert( make_pair(*i, uniqueStrings.size()));
    }


    cout << endl << endl<< "dump of uniqueStrings" << endl << endl;
    for_each( uniqueStrings.begin(), uniqueStrings.end(), dump());

    // I'm not sure if you'd need this `out` vector anymore - you can probably just
    //  use the `uniqueStrings` map directly for this information (but that would
    //  depend on your specific needs)

    vector<int> out;
    for (vector<string>::iterator i = myVect.begin(); i != myVect.end(); ++i) {
        out.push_back( uniqueStrings[*i]);
    }

    cout << endl << endl << "dump of `out` vector" << endl << endl;
    for_each( out.begin(), out.end(), dump());

    return 0;
}