Question

我正在使用字符串TreeMap<String, String>的TreeMap，并使用它来实现单词的词典。

然后我有一个文件集合，并希望在字典定义的向量空间（单词空格）中创建每个文件的表示。

每个文件都应该有一个表示它的向量，并带有以下属性：

vector应与字典大小相同
对于文件中的每个单词包含，该向量在与字典中的单词位置对应的位置应该具有 1
对于文件中的每个单词未包含，向量在与字典中的单词位置对应的位置应该具有 -1

所以我的想法是使用Vector<Boolean>来实现这些向量。（这种表示集合中文档的方式称为布尔模型 - http://www.site.uottawa.ca/~diana/csi4107/L3.pdf）

我在创建此向量的过程中遇到的问题是我需要一种方法来查找字典中单词的位置，如下所示：

String key;
int i = get_position_of_key_in_Treemap(key); <--- purely invented method...

1）我可以在TreeMap上使用这样的方法吗？如果没有，你可以提供一些代码来帮我自己实现吗？

2）TreeMap上是否有一个迭代器（它按键按字母顺序排列）我可以获得它的位置？

3）最终我应该使用另一个类来实现字典吗？（如果你认为使用TreeMaps我不能做我需要的）如果是的话，哪个？

提前致谢。

增加的部分：

dasblinkenlight提出的解决方案看起来很好，但是存在复杂性问题（由于将密钥复制到数组中而与字典的维度呈线性关系），并且不能接受为每个文件执行此操作的想法。

我的问题还有其他想法吗？

Answer 1

构建树形图后，将其排序的密钥复制到一个数组中，并使用Arrays.binarySearch在O（logN）时间内查找索引。如果您需要该值，也可以在原始地图上查找。

编辑：这是将密钥复制到数组中的方法

String[] mapKeys = new String[treeMap.size()];
int pos = 0;
for (String key : treeMap.keySet()) {
    mapKeys[pos++] = key;
}

Answer 2

另一种解决方案是使用TreeMap的{{3}}方法。如果单词存在于TreeMap中，则其头部地图的headMap等于字典中单词的索引。与我的其他答案相比，它可能有点浪费。

以下是使用Java编写代码的方法：

import java.util.*;

class Test {
    public static void main(String[] args) {
        TreeMap<String,String> tm = new TreeMap<String,String>();
        tm.put("quick", "one");
        tm.put("brown", "two");
        tm.put("fox", "three");
        tm.put("jumps", "four");
        tm.put("over", "five");
        tm.put("the", "six");
        tm.put("lazy", "seven");
        tm.put("dog", "eight");
        for (String s : new String[] {
            "quick", "brown", "fox", "jumps", "over",
            "the", "lazy", "dog", "before", "way_after"}
        ) {
            if (tm.containsKey(s)) {
                // Here is the operation you are looking for.
                // It does not work for items not in the dictionary.
                int pos = tm.headMap(s).size();
                System.out.println("Key '"+s+"' is at the position "+pos);
            } else {
                System.out.println("Key '"+s+"' is not found");
            }
        }
    }
}

以下是该程序产生的输出：

Key 'quick' is at the position 6
Key 'brown' is at the position 0
Key 'fox' is at the position 2
Key 'jumps' is at the position 3
Key 'over' is at the position 5
Key 'the' is at the position 7
Key 'lazy' is at the position 4
Key 'dog' is at the position 1
Key 'before' is not found
Key 'way_after' is not found

Answer 3

JDK本身没有这样的实现。尽管TreeMap以自然键排序进行迭代，但其内部数据结构都基于树而不是数组（请记住，Maps不按顺序排序键，尽管这是非常常见的用例）

那就是说，你必须做出选择，因为你的比较标准不可能有O（1）计算时间用于插入Map和indexOf(key)计算。这是因为字典顺序在可变数据结构中不稳定（例如，与插入顺序相反）。例如：一旦将第一个键值对（条目）插入到地图中，其位置将始终为1。但是，根据插入的第二个键，该位置可能会更改，因为新键可能比Map中的“更大”或“更低”。您可以通过在插入操作期间维护和更新索引的键列表来实现此目的，但是您将有插入操作的O（n log（n））（因为需要重新排序数组）。这可能是可取的，取决于您的数据访问模式。

Apache Commons中的

ListOrderedMap和LinkedMap都接近你所需要的，但依赖于插入顺序。我相信，你可以检查一下它们的实现并开发出你自己的问题解决方案，而不是中等努力（这应该只是用排序列表替换ListOrderedMap内部支持数组的问题 - {{1}例如，在Apache Commons中。）

您也可以通过减去低于给定键的元素数量来自己计算索引（这应该比迭代搜索元素的列表更快，在最常见的情况下 - 因为您不是比较任何事情。）

Answer 4

我要感谢你们所有人在回答我的问题时付出的努力，他们都非常有用，并且从他们每个人那里获得了最大的收益，这使我得出了我在项目中实际实施的解决方案。 / p>

我认为对我的单一问题的最佳答案是：

2）没有在TreeMaps上定义的Iterator为@Isoliveira sais：

There's no such implementation in the JDK itself. 
Although TreeMap iterates in natural key ordering,
its internal data structures are all based on trees and not arrays
(remember that Maps do not order keys, by definition, 
in spite of that the very common use case).

正如我在此回答How to iterate over a TreeMap?中找到的那样，迭代Map中元素的唯一方法是使用map.entrySet()并使用Set上定义的迭代器（或其他一些有迭代器的类。）

3）可以使用TreeMap来实现Dictionary，但这将在查找包含单词的索引（树数据结构中的查找成本）时获得O（logN）的复杂性。

使用具有相同过程的HashMap将具有复杂度O（1）。

1）没有这样的方法。唯一的解决方案是完全实现它。

正如@Paul所说

Assumes that once getPosition() has been called, the dictionary is not changed.

解决方案的假设是，一旦创建了该词典，它就不会被改变：这样一个词的位置将始终是相同的。

给出这个假设我找到了一个解决方案，它允许构建具有复杂度O（N）的Dictionary，并且在获得查找后获得包含constat time O（1）的单词索引的可能性。

我将Dictionary定义为HashMap，如下所示：

public HashMap<String, WordStruct> dictionary = new HashMap<String, WordStruct>();

key - ＆gt;代表词典

String

值 - ＆gt;已创建的班级Object

WordStruct

其中WordStruct类的定义如下：

public class WordStruct {

    private int DictionaryPosition;    // defines the position of word in dictionary once it is alphabetically ordered

    public WordStruct(){

    }

    public SetWordPosition(int pos){
        this.DictionaryPosition = pos;
    }

}

并允许我记住我喜欢与词典的单词条目耦合的任何属性。

现在我填写字典迭代我的集合的所有文件中包含的所有单词：

THE FOLLOWING IS PSEUDOCODE

for(int i = 0; i < number_of_files ; i++){

        get_file(i);

        while (file_contais_words){

            dictionary.put( word(j) , new LemmaStruct());

        }

}

一旦HashMap以任何顺序填充，我使用@dasblinkenlight指示的程序一次性地命令它复杂的O（N）

    Object[] dictionaryArray = dictionary.keySet().toArray();
    Arrays.sort(dictionaryArray);

    for(int i = 0; i < dictionaryArray.length; i++){

        String word = (String) dictionaryArray[i];
        dictionary.get(word).SetWordPosition(i);

    }

从现在开始，在字典中按字母顺序排列索引位置只需要访问它的变量DictionaryPosition：

因为知道你只需要访问它，这在HashMap中有不变的成本。

再次感谢，祝大家圣诞快乐！

Answer 5

我遇到了同样的问题。所以我获取了java.util.TreeMap的源代码并编写了 IndexedTreeMap 。它实现了我自己的 IndexedNavigableMap ：

public interface IndexedNavigableMap<K, V> extends NavigableMap<K, V> {
   K exactKey(int index);
   Entry<K, V> exactEntry(int index);
   int keyIndex(K k);
}

实现基于更改红黑树中的节点权重。权重是给定节点下的子节点数加一个自身。例如，当树向左旋转时：

    private void rotateLeft(Entry<K, V> p) {
    if (p != null) {
        Entry<K, V> r = p.right;

        int delta = getWeight(r.left) - getWeight(p.right);
        p.right = r.left;
        p.updateWeight(delta);

        if (r.left != null) {
            r.left.parent = p;
        }

        r.parent = p.parent;


        if (p.parent == null) {
            root = r;
        } else if (p.parent.left == p) {
            delta = getWeight(r) - getWeight(p.parent.left);
            p.parent.left = r;
            p.parent.updateWeight(delta);
        } else {
            delta = getWeight(r) - getWeight(p.parent.right);
            p.parent.right = r;
            p.parent.updateWeight(delta);
        }

        delta = getWeight(p) - getWeight(r.left);
        r.left = p;
        r.updateWeight(delta);

        p.parent = r;
    }
  }

updateWeight只是更新权重到根：

   void updateWeight(int delta) {
        weight += delta;
        Entry<K, V> p = parent;
        while (p != null) {
            p.weight += delta;
            p = p.parent;
        }
    }

当我们需要通过索引找到元素时，这是使用权重的实现：

public K exactKey(int index) {
    if (index < 0 || index > size() - 1) {
        throw new ArrayIndexOutOfBoundsException();
    }
    return getExactKey(root, index);
}

private K getExactKey(Entry<K, V> e, int index) {
    if (e.left == null && index == 0) {
        return e.key;
    }
    if (e.left == null && e.right == null) {
        return e.key;
    }
    if (e.left != null && e.left.weight > index) {
        return getExactKey(e.left, index);
    }
    if (e.left != null && e.left.weight == index) {
        return e.key;
    }
    return getExactKey(e.right, index - (e.left == null ? 0 : e.left.weight) - 1);
}

还可以非常方便地找到密钥的索引：

    public int keyIndex(K key) {
    if (key == null) {
        throw new NullPointerException();
    }
    Entry<K, V> e = getEntry(key);
    if (e == null) {
        throw new NullPointerException();
    }
    if (e == root) {
        return getWeight(e) - getWeight(e.right) - 1;//index to return
    }
    int index = 0;
    int cmp;
    if (e.left != null) {
        index += getWeight(e.left);
    }
    Entry<K, V> p = e.parent;
    // split comparator and comparable paths
    Comparator<? super K> cpr = comparator;
    if (cpr != null) {
        while (p != null) {
            cmp = cpr.compare(key, p.key);
            if (cmp > 0) {
                index += getWeight(p.left) + 1;
            }
            p = p.parent;
        }
    } else {
        Comparable<? super K> k = (Comparable<? super K>) key;
        while (p != null) {
            if (k.compareTo(p.key) > 0) {
                index += getWeight(p.left) + 1;
            }
            p = p.parent;
        }
    }
    return index;
}

我很快就会实现IndexedTreeSet，同时你可以使用IndexedTreeMap中的键集。

更新：现在已实施IndexedTreeSet。

您可以在https://github.com/geniot/indexed-tree-map

找到这项工作的结果

Answer 6

我同意Isolvieira。也许最好的方法是使用与TreeMap不同的结构。

但是，如果您仍然希望计算密钥的索引，那么解决方案是计算低于您要查找的密钥的密钥数。

以下是代码段：

    java.util.SortedMap<String, String> treeMap = new java.util.TreeMap<String, String>();
    treeMap.put("d", "content 4");
    treeMap.put("b", "content 2");
    treeMap.put("c", "content 3");
    treeMap.put("a", "content 1");

    String key = "d"; // key to get the index for
    System.out.println( treeMap.keySet() );

    final String firstKey = treeMap.firstKey(); // assuming treeMap structure doesn't change in the mean time
    System.out.format( "Index of %s is %d %n", key, treeMap.subMap(firstKey, key).size() );

Answer 7

您是否考虑过让TreeMap中的值包含字典中的位置？我在这里使用BitSet作为我的文件详细信息。

这与我下面的其他想法几乎没有效果。

Map<String,Integer> dictionary = new TreeMap<String,Integer> ();

private void test () {
  // Construct my dictionary.
  buildDictionary();
  // Make my file data.
  String [] file1 = new String[] {
    "1", "3", "5"
  };
  BitSet fileDetails = getFileDetails(file1, dictionary);
  printFileDetails("File1", fileDetails);
}

private void printFileDetails(String fileName, BitSet details) {
  System.out.println("File: "+fileName);
  for ( int i = 0; i < details.length(); i++ ) {
    System.out.print ( details.get(i) ? 1: -1 );
    if ( i < details.length() - 1 ) {
      System.out.print ( "," );
    }
  }
}

private BitSet getFileDetails(String [] file, Map<String, Integer> dictionary ) {
  BitSet details = new BitSet();
  for ( String word : file ) {
    // The value in the dictionary is the index of the word in the dictionary.
    details.set(dictionary.get(word));
  }
  return details;
}

String [] dictionaryWords = new String[] {
  "1", "2", "3", "4", "5"
};

private void buildDictionary () {
  for ( String word : dictionaryWords ) {
    // Initially make the value 0. We will change that later.
    dictionary.put(word, 0);
  }
  // Make the indexes.
  int wordNum = 0;
  for ( String word : dictionary.keySet() ) {
    dictionary.put(word, wordNum++);
  }
}

此处文件详细信息的构建包含TreeMap中文件中每个单词的单个查找。

如果您打算将字典value中的TreeMap用于其他内容，则可以始终使用Integer撰写。

<强>加

进一步思考，如果value的{{1}}字段被指定用于某些内容，则可以始终使用特殊键来计算自己在Map中的位置，并且行为就像{ {1}}用于比较。

Map

注意：假设一旦调用String，字典就不会改变。

Answer 8

我建议您编写一个SkipList来存储您的字典，因为这仍然会提供O（log N）查找，插入和删除，同时还能够提供索引（树实现通常不会返回索引，因为节点不知道它，并且保持更新会有成本）。不幸的是，ConcurrentSkipListMap的java实现不提供索引，因此您需要实现自己的版本。

获取项目的索引将为O（log N），如果您想要索引和值而不进行2次查找，则需要返回包含两者的包装器对象。

在Java TreeMap中查找元素位置

8 个答案: