Question

我启动了一个hashmap和一个嵌套的hashmap来存储术语，它的出现次数和频率。

for (i = 1; i < lineTokens.length; i += 2) 
{   
    if (i + 1 >= lineTokens.length) continue;  
    String fileName = lineTokens[i];
    int frequency = Integer.parseInt(lineTokens[i + 1]); 
    postingList2.put(fileName,frequency);
    //System.out.println(postingList2);
}
postingList.put(topic, postingList2);

它为我输出：{cancel = {WET4793.txt = 16，WET5590.txt = 53}，不可用= {WET4291.txt = 10}，电台信息= {WET2266.txt = 32}，宣传计划= { WET2776.txt = 32}，没有ratinglogin = {WET5376.txt = 76}，我试图用矩阵表示整个事物。但我不能将0设置为不包含特定术语的文件。它就像：

row-> term
column -> document
mat[row][column]= frequency of occurances of terms in the document.

我使用pandas dataframe在python中轻松完成了它。

Answer 1

鉴于您的初始HashMap，转换为Matrix需要三个步骤

为每个主题（0,1 ...）
为每个文档（0,1，..）
使用上述索引填充矩阵

此解决方案将使用Map查找（键是发布/文档）以提高效率。可以控制过帐/文件的顺序;这里没有尝试创建特定的订单。

第1步：为帖子创建唯一ID并创建查找地图

Map<String, Integer> topicIndex = new HashMap<>();
List<String> topicList = new ArrayList<>();  // topicList is used to print the matrix
int index = 0;
for (String topic : postingList.keySet()) {
    if (!topicIndex.containsKey(topic)) {
        topicIndex.put(topic, index++);
        topicList.add(topic);
    }
}

此地图的结果是（所有字词现在都有唯一ID）：

Topics: {cancel=0, unavailable=1, station info=2, advocacy program=3, no ratingslogin=4}

第2步：为文档创建唯一ID并创建查找地图

index = 0;
Map<String, Integer> documentIndex = new HashMap<>();
for (String topic : postingList.keySet()) {
    for (String document : postingList.get(topic).keySet()) {
        if (!documentIndex.containsKey(document))
            documentIndex.put(document, index++);
    }
}

此Map的结果是（所有文档现在都有唯一的ID）：

Documents: {WET4793.txt=0, WET4291.txt=2, WET2266.txt=3, WET2776.txt=4, WET5376.txt=5, WET5590.txt=1}

第3步：创建并填充矩阵

int[][] mat = new int[topicIndex.size()][documentIndex.size()];
for (String topic : postingList.keySet()) {
    for (String document : postingList.get(topic).keySet()) {
        mat[topicIndex.get(topic)][documentIndex.get(document)] = postingList.get(topic).get(document);
    }
}

结果：矩阵现在看起来像这样：

cancel          16 53  0  0  0  0 
unavailable      0  0 10  0  0  0 
station info     0  0  0 32  0  0 
advocacy program 0  0  0  0 32  0 
no ratingslogin  0  0  0  0  0 76

编辑：循环打印矩阵

    for (int row = 0; row < topicIndex.size(); row++) {
        System.out.printf("%-16s", topicList.get(row));
        for (int col = 0; col < documentIndex.size(); col++) {
            System.out.printf("%2d ", mat[row][col]);
        }
        System.out.println();
    }

Java hashmap到矩阵转换

1 个答案: