我有一个文本文件,文件中的每一行都以一个单词开头,后跟50个表示单词矢量描述(嵌入)的浮点数。我正在尝试阅读文件并将每个单词及其嵌入存储在哈希表中。我遇到的问题是我得到一个数字格式异常或有时一个数组越界异常。如何在哈希映射中读取和存储每个单词及其嵌入?
sNode类:
public class sNode{ // Node class for hash map
public String word;
public float[] embedding;
public sNode next;
public sNode(String S, float[] E, sNode N){ // Constructor
word = S;
embedding = new float[50];
for (int i=0;i<50;i++)
embedding[i] = E[i]; next = N;
}
hashTableStrings类:
public class hashTableStrings{
private static sNode [] H;
private int TABLE_SIZE;
private int size;
public hashTableStrings(int n){ // Initialize all lists to null H = new sNode[n]; for(int i=0;i<n;i++) H[i] = null; }
size = 0;
TABLE_SIZE = n;
H = new sNode[TABLE_SIZE];
for(int i=0;i<TABLE_SIZE;i++)
H[i] = null;
}
public int getSize(){ // Function to get number of key-value pairs
return size;
}
public static void main (String [] args) throws IOException{
Scanner scanner = new Scanner(new FileReader("glove.6B.50d.txt"));
HashMap<String, Float> table = new HashMap<String, Float>();
while (scanner.hasNextLine()) {
String[] words = scanner.nextLine().split("\t\t"); // split space between word and float number embedding
for (int i=0; i<50;i++){
table.put(words[0], Float.parseFloat(words[i]));
}
}
System.out.println(table);
}
该文件可在以下链接中找到: https://nlp.stanford.edu/projects/glove/
下载文件
glove.6B.zip
然后打开
glove.6B.50d.txt
文本文件。
答案 0 :(得分:0)
你得到的理由&#34;阵列超出界限&#34;异常是因为你将字符串拆分为&#34; \ t \ t&#34;双标签空间。而只有一个空间。因此,每行不会被分成多个单词,而是分为1个整个字符串,并且您只得到1个长度数组。
String[] words = scanner.nextLine().split("\t\t");
// words.length will return 1, since it contains only single String( Whole line).
用split("\t\t")
替换split(" ")
应该可以解决问题。顺便说一下,每行总共有51个单词(如果你在每行中包含起始单词)。所以你应该i < 51 not i <50
。
for(i = 1; i < 51; i++){
// Do your work...
}
// i is starting from 1st index because at 0th index, the starting word will be placed and the floating points starts from 1st index.
然而,正如@Satish Thulva指出的那样,你的代码仍有一些问题。你使用HashMap的方式,键(word)将只有最后一个浮动值(不是行中的整个浮点值) )因为它有价值。 例如,
truecar.com -0.23163 0.39098 -0.7428 1.5123 -1.2368 -0.89173 -0.051826 -1.1305 0.96384 -0.12672 -0.8412 -0.76053 0.10582 -0.23173 0.11274 0.26327 0.053071 0.66657 0.9423 -0.78162 1.6225 0.097435 -0.67124 0.46235 0.3226 1.3423 0.87102 0.2217 -0.068228 0.73468 -1.0692 -0.85722 -0.49683 -1.4468 -1.1979 -0.49506 -0.36319 0.53553 -0.046529 1.5829 -0.1326 -0.55717 -0.17242 0.99214 0.73551 -0.51421 0.29743 0.19933 0.87613 0.63135
在您的情况下,结果将是
Key: truecar.com value: 0.63135
要将Value
的所有浮动值存储为key
,请使用HashMap<String, Float[]>
String[] words = scanner.nextLine().split(" "); // split space between word and float number embedding
//An Array of Float which will keep values for words.
Float values[] = new Float[ words.length-1 ]; // because we are not going to store word as its value.
for( int i=1; i< words.length; i++){
values[i-1] = Float.parseFloat(words[i]) ; }
// Now all the values are stored in array.
// Now store it in the Map.
table.put(words[0], values);