我正在尝试计算文本文件中的单词对。我的目标是将字符串中的每个单词映射到其后面的单词,然后计算重复的键/值对。我不关心订单。我的代码目前正在使用HashMap来存储每个单词对,但是使用HashMap我会丢失重复的条目。如果我的文本文件包含:"FIRST SECOND THIRD FIRST SECOND"
,我将获得输出:FIRST [SECOND] SECOND[] THIRD [FIRST]
。因此,如果我有一个重复的键,则以下字符串值将覆盖之前的值。 Brandon Ling在之前的帖子中帮助了我,但是我不清楚他的目标。我现在终于意识到HashMap可能无法正常工作
任何帮助将不胜感激。
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.FileNotFoundException;
import java.util.Iterator;
import java.util.Scanner;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.ArrayList;
import java.util.Set;
import java.util.TreeMap;
public class Assignment1
{
// returns an InputStream that gets data from the named file
private static InputStream getFileInputStream(String fileName)
{
InputStream inputStream;
try {
inputStream = new FileInputStream(new File(fileName));
}
catch (FileNotFoundException e) { // no file with this name exists
System.err.println(e.getMessage());
inputStream = null;
}
return inputStream;
}
// @SuppressWarnings("unchecked")
public static void main(String[] args)
{
InputStream in = System.in;
in = getFileInputStream(args[0]);
System.out.println("number of words is" + in);
if (in != null)
{
// Using a Scanner object to read one word at a time from the input stream.
@SuppressWarnings("resource")
Scanner sc = new Scanner(in);
String word;
System.out.println("CS261 - Assignment 1 -AdamDavis%n%n");
System.out.println("");
System.out.println("");
// Continue getting words until we reach the end of input
List<String> inputWords = new ArrayList<String>();
HashMap<String, List<String>> wordPairs = new HashMap<String, List<String>>();
while (sc.hasNext())
{
word = sc.next();
if (!word.equals(null))
{
inputWords.add(word);
System.out.println("");
System.out.println("");
}
}
Iterator<String> it = inputWords.iterator();
boolean firstWord = true;
String currentWord = null;
String previousWord = null;
while(it.hasNext())
{
currentWord = it.next();
wordPairs.put(currentWord, new ArrayList<String>());
if(firstWord == true)
{
//System.out.println("this is result inside if first == null:" + wordPairs.containsKey(currentWord));
firstWord = false;
}
else
{
// System.out.println("this is result inside else:" + currentWord);
wordPairs.get(previousWord).add(currentWord);
//System.out.println("this is result inside else:" + wordPairs.containsKey(previousWord));
}
previousWord = currentWord;
}
{
Entry<String, List<String>> Pairs = iter.next();
System.out.println("this is the key in pairs: " +Pairs.getKey());
Pairs.getValue();
System.out.println("this is the key in pairs: " +Pairs.getValue());
int count = 0;
if(iter.hasNext())
{
count ++;
}
Set<Entry<String, List<String>>> s = wordPairs.entrySet();
Iterator<Entry<String, List<String>>> itr=s.iterator();
while(itr.hasNext())
{
Entry<String, List<String>> Pairs = itr.next();
System.out.println(Pairs.getKey()+"\t"+Pairs.getValue());
}
}
}
}
答案 0 :(得分:1)
您可以使用apache commons org.apache.commons.collections.map.MultiKeyMap
,它允许您存储多个键,然后只需添加整数作为值来维护计数器。
MultiKeyMap map = new MultiKeyMap();
Integer counter = new Integer(1);
map.put("String1","String2",counter);
Integer value = (Integer)map.get("String1", "String2");
或者您可以为地图创建组合键。字1 + WORD2。然后使用整数继续
Map<String,Integer> map = new HashMap<>();
String key = "word1" + "|" + "word2";
Integer value = new Integer(1);
map.put(key,value);
Integer cntr = map.get(key);
答案 1 :(得分:0)
我会做以下事情:
FIRST#SECOND -> 2, SECOND#THIRD -> 1
代码:
Map<String, Integer> pairsCount = new HashMap<>();
Iterator<String> it = inputWords.iterator();
String currentWord = null;
String previousWord = null;
while( it.hasNext() ) {
currentWord = it.next();
if( previousWord != null ) {
String key = previousWord.concat( "#" ).concat( currentWord );
if( pairsCount.containsKey( key ) ) {
Integer lastCount = pairsCount.get( key );
pairsCount.put( key, lastCount + 1 );
} else {
pairsCount.put( key, 1 );
}
}
previousWord = currentWord;
}
// output all pairs with count
for( Map.Entry<String, Integer> entry : pairsCount.entrySet() )
System.out.printf( "%s %s -> %d", entry.getKey().split( "#" )[0], entry.getKey().split( "#" )[1], entry.getValue() );
答案 2 :(得分:0)
您可以使用Java 8流创建一个HashMap,其中包含单词对计数。
import java.util.Arrays;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.nio.file.Files;
import java.nio.file.FileSystems;
import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.counting;
public class Words {
public static void main(String[] args) throws Exception {
String fileContent = new String(Files.readAllBytes(FileSystems.getDefault().getPath(args[0])));
String[] inputWords = fileContent.split("\\s+");
System.out.println("number of words is " + inputWords.length);
List<List<String>> wordPairs = new ArrayList<>();
String previousWord = null;
for(String word: inputWords) {
if(previousWord != null) wordPairs.add(Arrays.asList(previousWord, word));
previousWord = word;
}
Map<List<String>, Long> pairCounts = wordPairs.stream().collect(groupingBy(pair -> pair, counting()));
System.out.println(pairCounts);
}
}