Question

我必须使用Java计算文本文档中唯一单词的数量。首先，我必须摆脱所有单词中的标点符号。我使用Scanner类来扫描文档中的每个单词并放入一个字符串ArrayList。

所以，下一步是我遇到问题的地方！如何创建一个可以计算数组中唯一字符串数量的方法？

例如，如果数组包含apple，bob，apple，jim，bob;此数组中唯一值的数量为3。

public countWords() {
    try {
        Scanner scan = new Scanner(in);
        while (scan.hasNext()) {
            String words = scan.next();
            if (words.contains(".")) {
                words.replace(".", "");
            }
            if (words.contains("!")) {
                words.replace("!", "");
            }
            if (words.contains(":")) {
                words.replace(":", "");
            }
            if (words.contains(",")) {
                words.replace(",", "");
            }
            if (words.contains("'")) {
                words.replace("?", "");
            }
            if (words.contains("-")) {
                words.replace("-", "");
            }
            if (words.contains("‘")) {
                words.replace("‘", "");
            }
            wordStore.add(words.toLowerCase());
        }
    } catch (FileNotFoundException e) {
        System.out.println("File Not Found");
    }
    System.out.println("The total number of words is: " + wordStore.size());
}

Answer 1

你被允许使用Set吗？如果是这样，您HashSet可能会解决您的问题。 HashSet不接受重复。

HashSet noDupSet = new HashSet();
noDupSet.add(yourString);
noDupSet.size();

size()方法返回唯一字数。

如果你必须真正使用ArrayList，那么可以采用一种方法，

1) Create a temp ArrayList
2) Iterate original list and retrieve element
3) If tempArrayList doesn't contain element, add element to tempArrayList

Answer 2

从 Java 8 开始，您可以使用Stream：

在ArrayList中添加元素后：

long n = wordStore.stream().distinct().count();

它会将您的ArrayList转换为流，然后它只计算不同的元素。

Answer 3

我建议使用HashSet。这会在调用add方法时自动过滤副本。

Answer 4

虽然我认为一个集合是最简单的解决方案，但您仍然可以使用原始解决方案，只需添加一个if语句来检查列表中是否已存在值，然后再进行添加。

if( !wordstore.contains( words.toLowerCase() )
   wordStore.add(words.toLowerCase());

然后列表中的单词数是唯一单词的总数（即：wordStore.size（））

Answer 5

您可以通过以下方式快速完成以下操作......

    ArrayList<String> duplicateList = new ArrayList<String>();
    duplicateList.add("one");
    duplicateList.add("two");
    duplicateList.add("one");
    duplicateList.add("three");

    System.out.println(duplicateList); // prints [one, two, one, three]

    HashSet<String> uniqueSet = new HashSet<String>();

    uniqueSet.addAll(duplicateList);
    System.out.println(uniqueSet); // prints [two, one, three]

    duplicateList.clear();
    System.out.println(duplicateList);// prints []


    duplicateList.addAll(uniqueSet);
    System.out.println(duplicateList);// prints [two, one, three]

Answer 6

您也可以创建HashTable或HashMap。键将是您的输入字符串，Value将是输入数组中字符串出现的次数。 O（N）时间和空间。

解决方案2：

对输入列表进行排序。类似的字符串将彼此相邻。将列表（i）与列表（i + 1）进行比较并计算重复次数。

Answer 7

public class UniqueinArrayList {

    public static void main(String[] args) { 
        StringBuffer sb=new StringBuffer();
        List al=new ArrayList();
        al.add("Stack");
        al.add("Stack");
        al.add("over");
        al.add("over");
        al.add("flow");
        al.add("flow");
        System.out.println(al);
        Set s=new LinkedHashSet(al);
        System.out.println(s);
        Iterator itr=s.iterator();
        while(itr.hasNext()){
            sb.append(itr.next()+" ");
        }
        System.out.println(sb.toString().trim());
    }

}

Answer 8

这种通用解决方案利用了Set抽象数据类型不允许重复的事实。 Set.add（）方法特别有用，因为它返回一个布尔标志，指示'add'操作的成功。 HashMap用于跟踪每个原始元素的出现。该算法可适用于此类问题的变化。该解决方案产生O（n）性能..

public static void main(String args[])
{
  String[] strArray = {"abc", "def", "mno", "xyz", "pqr", "xyz", "def"};
  System.out.printf("RAW: %s ; PROCESSED: %s \n",Arrays.toString(strArray), duplicates(strArray).toString());
}

public static HashMap<String, Integer> duplicates(String arr[])
{

    HashSet<String> distinctKeySet = new HashSet<String>();
    HashMap<String, Integer> keyCountMap = new HashMap<String, Integer>();

    for(int i = 0; i < arr.length; i++)
    {
        if(distinctKeySet.add(arr[i]))
            keyCountMap.put(arr[i], 1); // unique value or first occurrence
        else
            keyCountMap.put(arr[i], (Integer)(keyCountMap.get(arr[i])) + 1);
    }     

    return keyCountMap; 
}

结果：

RAW：[abc，def，mno，xyz，pqr，xyz，def];已处理：{pqr = 1，abc = 1，def = 2，xyz = 2，mno = 1}

Answer 9

3种不同的可能解决方案：

按照上面的建议使用HashSet。
创建一个临时library(zoo) #function that handles conversion to zoo time series my_zoo=function(x,idx) { date_range=seq(min(idx),max(idx),by="day") #add missing dates dummy_zoo=merge(zoo(x,idx),zoo(NA,date_range),all=TRUE)[,1] #add NA entry at top/bottom rbind(dummy_zoo,rbind(zoo(NA,max(idx)+1),zoo(NA,min(idx)-1))) } #split by ID, handle cases where drug is NA split_data=lapply(split(data,df$ID),function(x) { list(score=my_zoo(x$score,x$date), taken=(my_zoo(x$drug,x$date)==1)& !is.na(my_zoo(x$drug,x$date)))}) #calculate stats #your requirement that subsequent days with drug taken... #... are completely omitted is a bit tricky to handle res=data.frame( mean_m1=sapply(split_data,function(x) { mean(x$score[diff(x$taken,-1)>0& lag(diff(x$taken),+1)], na.rm=TRUE)}), mean_0=sapply(split_data,function(x) { mean(x$score[x$taken], na.rm=TRUE)}), mean_p1=sapply(split_data,function(x) { mean(x$score[diff(x$taken,+1)<0& lag(diff(x$taken),-1)], na.rm=TRUE)})) res # mean_m1 mean_0 mean_p1 # A 3.5 4.00 4.0 # B 3.0 3.00 4.0 # C NaN 3.25 2.5并仅存储如下所示的唯一元素：
```
chrome://
```
Java 8解决方案
```
chrome://history/
```

如何计算ArrayList中的唯一值？

9 个答案: