Question

我有一个文本文件，文件中的每一行都以一个单词开头，后跟50个表示单词矢量描述（嵌入）的浮点数。我正在尝试阅读文件并将每个单词及其嵌入存储在哈希表中。我遇到的问题是我得到一个数字格式异常或有时一个数组越界异常。如何在哈希映射中读取和存储每个单词及其嵌入？

sNode类：

public class sNode{ // Node class for hash map
public String word; 
public float[] embedding; 
public sNode next;

public sNode(String S, float[] E, sNode N){ // Constructor
    word = S; 
    embedding = new float[50];
    for (int i=0;i<50;i++) 
        embedding[i] = E[i]; next = N; 
}

hashTableStrings类：

public class hashTableStrings{ 
private static sNode [] H;
private int TABLE_SIZE;
private int size; 

public hashTableStrings(int n){ // Initialize all lists to null H = new sNode[n]; for(int i=0;i<n;i++) H[i] = null; }
    size = 0;
    TABLE_SIZE = n;
    H = new sNode[TABLE_SIZE]; 
    for(int i=0;i<TABLE_SIZE;i++) 
        H[i] = null;
}

public int getSize(){ // Function to get number of key-value pairs
    return size;
}


public static void main (String [] args) throws IOException{
    Scanner scanner = new Scanner(new FileReader("glove.6B.50d.txt"));

    HashMap<String, Float> table = new HashMap<String, Float>();

    while (scanner.hasNextLine()) {
        String[] words = scanner.nextLine().split("\t\t"); // split space between word and float number embedding
        for (int i=0; i<50;i++){
            table.put(words[0], Float.parseFloat(words[i]));
        }
    }

    System.out.println(table);

}

Txt文件示例：

该文件可在以下链接中找到： https://nlp.stanford.edu/projects/glove/

下载文件

glove.6B.zip

然后打开

glove.6B.50d.txt

文本文件。

Answer 1

你得到的理由＆＃34;阵列超出界限＆＃34;异常是因为你将字符串拆分为＆＃34; \ t \ t＆＃34;双标签空间。而只有一个空间。因此，每行不会被分成多个单词，而是分为1个整个字符串，并且您只得到1个长度数组。

 String[] words = scanner.nextLine().split("\t\t");
// words.length will return 1, since it contains only single String( Whole line).

用split("\t\t")替换split(" ")应该可以解决问题。顺便说一下，每行总共有51个单词（如果你在每行中包含起始单词）。所以你应该i < 51 not i <50。

  for(i = 1; i < 51; i++){
     // Do your work...
    }

  //  i is starting from 1st index because at 0th index, the starting word will be placed and the floating points starts from 1st index.

然而，正如@Satish Thulva指出的那样，你的代码仍有一些问题。你使用HashMap的方式，键（word）将只有最后一个浮动值（不是行中的整个浮点值））因为它有价值。例如，

truecar.com  -0.23163  0.39098  -0.7428  1.5123  -1.2368  -0.89173  -0.051826  -1.1305  0.96384  -0.12672  -0.8412  -0.76053  0.10582  -0.23173  0.11274  0.26327  0.053071  0.66657  0.9423  -0.78162  1.6225  0.097435  -0.67124  0.46235  0.3226  1.3423  0.87102  0.2217  -0.068228  0.73468  -1.0692  -0.85722  -0.49683  -1.4468  -1.1979  -0.49506  -0.36319  0.53553  -0.046529  1.5829  -0.1326  -0.55717  -0.17242  0.99214  0.73551  -0.51421  0.29743  0.19933  0.87613  0.63135

在您的情况下，结果将是

 Key: truecar.com  value: 0.63135

要将Value的所有浮动值存储为key，请使用HashMap<String, Float[]>

String[] words = scanner.nextLine().split(" "); // split space between word and float number embedding

        //An Array of Float which will keep values for words.
        Float values[] = new Float[ words.length-1 ];    //  because we are not going to store word as its value.
        for( int i=1; i< words.length; i++){
            values[i-1] = Float.parseFloat(words[i]) ; }

        // Now all the values are stored in array.
        // Now store it in the Map.
        table.put(words[0], values);

使用string和float数据读取并存储文本文件到hashmap中

1 个答案: