如何从给定数据集(Java)中提取趋势词?

时间:2017-09-20 08:11:19

标签: java repeat trend

我想要Twitter趋势等结果。从给定的数据集,我想得到最常见的单词。一共2或3个字。

其实我想要这个结果。

enter image description here

到目前为止,我从数据集中看到的词数最多,结果列表越来越少。 如何修改此代码以便最常见到两三个字?例如,我有新的数据集。我可以得到“真实”和“马德里”,但我希望看到“皇家马德里”甚至“皇家马德里赢”。

HashMap<String, Integer> wordCountMap = new HashMap<String, Integer>();

        BufferedReader reader = null;
        FileInputStream text = new FileInputStream("c:/news.txt");
        try
        {
            reader  = new BufferedReader(new InputStreamReader(text, "UTF-8")); 
            String currentLine = reader.readLine();
            while (currentLine != null)
            {  
                String[] words = currentLine.toUpperCase().split(" ");
                for (String word : words)
                {
                    if(wordCountMap.containsKey(word))
                    {    
                        wordCountMap.put(word, wordCountMap.get(word)+1);
                    }
                    {
                        wordCountMap.put(word, 1);
                    }
                }
                currentLine = reader.readLine();
            }
            Set<Entry<String, Integer>> entrySet = wordCountMap.entrySet();
            List<Entry<String, Integer>> list = new ArrayList<Entry<String,Integer>>(entrySet);
            Collections.sort(list, new Comparator<Entry<String, Integer>>() 
            {
                @Override
                public int compare(Entry<String, Integer> e1, Entry<String, Integer> e2) 
                {
                    return (e2.getValue().compareTo(e1.getValue()));
                }
            });
            System.out.println("Most seen words :");
            for (Entry<String, Integer> entry : list) 
            {
                if (entry.getValue() > 1)
                {
                    System.out.println(entry.getKey() + " : "+ entry.getValue());
                }
            }
        } 

这是我的示例数据集的一小部分。通常我会逐字逐行地阅读每一行。例如,我可以看到托尼看过2次Ball看过5次。但我想知道Tony Ball是否看得太多,我想看看Tony Ball 10次。我最常见的是Twitter Trends。

Tony Ball says that the landscape of British broadcasting has shifted dramatically after BT bought a large slice of televised football rights, boosting the Premier League's next TV deal to a record £3bn over three years, a 71% increase.

This equates to at least £14m more per year for each football club, with the bottom team in the league from 2013-14 onwards likely to receive more than the £60.6m Manchester City earned this year for ending the season as champions. Each individual televised match will now cost the broadcasters £6.6m, up from £4.7m under the previous deal.

BSkyB, which has built its business over 20 years on the back of live top flight football, retained most of the rights, securing 116 matches per season from 2013-14 in exchange for £2.3bn over three years.

But BT sprung a huge surprise by winning the rights to 38 games, including almost half the "first pick" games on offer, in exchange for £738m over three years. Richard Scudamore, Premier League chief executive, said BT's securing 18 of the 38 coveted "first pick" matches would be a "game changer". "[BT chief executive] Ian Livingstone and his colleagues have hugely ambitious plans. They have not invested in all this fibre [optic cable] for nothing, they want to establish a direct relationship with consumers," he said.

BT – the latest challenger to Sky after Setanta and ESPN – is expected to launch a new sports channel, available on a variety of platforms. But BT will use the rights to push its high speed broadband service. Its matches will be shown at Saturday lunchtime and on midweek evenings.

Against a grim economic backdrop elsewhere, Tony Ball admitted he was "surprised" by the huge hike in income, which he said would allow clubs to continue to compete with their European rivals.

The huge increase in income is good news for club owners, players, their agents and luxury car dealerships and, on the evidence of previous deals, is likely to lead to another sharp rise in transfer fees. But despite the unprecedented riches that have flowed into the coffers of top flight clubs during the Premier League era, clubs made losses of £361m last year despite record income of £2.3bn.

Scudamore pleaded with clubs not to simply use the new deal to rack up losses and fuel wage inflation. While he said he wanted clubs to still invest in the best talent, he also made a plea to invest in infrastructure and youth development.

"We are entering a new era with financial fair play [the new Europe-wide regulations of club spending], I'm hoping it will get invested in things other than playing talent. It should also be able to achieve sustainability," he said.

The effect on fans is more uncertain. BT and Sky may have to charge more to cover their huge investment. When asked whether clubs would use the windfall to subsidise ticket prices, Scudamore would say only that it "gives them more choices".

Tony Ball, the former BSkyB chief executive who helped fuel the company's growth in the mid-1990s, is a non-executive director on the BT board and is likely to have advised it on its bidding strategy. ESPN, the US giant that entered the market when Setanta went bust trying to compete with Sky, has now been frozen out.

有什么建议吗? 感谢。

0 个答案:

没有答案