代码高尔夫7月版第4期:计算前十个发生的单词

时间:2009-07-04 16:38:29

标签: text-files code-golf counting

考虑到以下总统名单,尽可能在最小的计划中排名前十位:

INPUT FILE

    Washington
    Washington
    Adams
    Jefferson
    Jefferson
    Madison
    Madison
    Monroe
    Monroe
    John Quincy Adams
    Jackson
    Jackson
    Van Buren
    Harrison 
    DIES
    Tyler
    Polk
    Taylor 
    DIES
    Fillmore
    Pierce
    Buchanan
    Lincoln
    Lincoln 
    DIES
    Johnson
    Grant
    Grant
    Hayes
    Garfield 
    DIES
    Arthur
    Cleveland
    Harrison
    Cleveland
    McKinley
    McKinley
    DIES
    Teddy Roosevelt
    Teddy Roosevelt
    Taft
    Wilson
    Wilson
    Harding
    Coolidge
    Hoover
    FDR
    FDR
    FDR
    FDR
    Dies
    Truman
    Truman
    Eisenhower
    Eisenhower
    Kennedy 
    DIES
    Johnson
    Johnson
    Nixon
    Nixon 
    ABDICATES
    Ford
    Carter
    Reagan
    Reagan
    Bush
    Clinton
    Clinton
    Bush
    Bush
    Obama

bash 97 字符

开始
cat input.txt | tr " " "\n" | tr -d "\t " | sed 's/^$//g' | sort | uniq -c | sort -n | tail -n 10

输出

      2 Nixon
      2 Reagan
      2 Roosevelt
      2 Truman
      2 Washington
      2 Wilson
      3 Bush
      3 Johnson
      4 FDR
      7 DIES

如你所愿,打破关系!快乐的第四个!

对于那些关心总统信息的人,可以找到here

17 个答案:

答案 0 :(得分:12)

C#,153:

p读取文件并将结果打印到控制台:

File.ReadLines(p)
    .SelectMany(s=>s.Split(' '))
    .GroupBy(w=>w)
    .OrderBy(g=>-g.Count())
    .Take(10)
    .ToList()
    .ForEach(g=>Console.WriteLine(g.Count()+"|"+g.Key));

如果仅生成列表但不打印到控制台,则为93个字符。

6|DIES
4|FDR
3|Johnson
3|Bush
2|Washington
2|Adams
2|Jefferson
2|Madison
2|Monroe
2|Jackson

答案 1 :(得分:11)

更短的shell版本:

xargs -n1 < input.txt | sort | uniq -c | sort -nr | head

如果您想要不区分大小写的排名,请将uniq -c更改为uniq -ci

稍微短一点,如果你对排名被逆转感到高兴,可读性因缺乏空间而受损。这个时钟有46个字符:

xargs -n1<input.txt|sort|uniq -c|sort -n|tail

(如果允许您首先将输入文件重命名为“i”,则可以将其删除为38。)

观察到,在这种特殊情况下,没有任何单词出现超过9次,我们可以通过从最终排序中删除'-n'参数来削减3个字符:

xargs -n1<input.txt|sort|uniq -c|sort|tail

将此解决方案降至43个字符而不重命名输入文件。 (或35,如果你这样做。)

使用xargs -n1将文件拆分为每行一个单词,优于tr \ \\n解决方案,因为这会产生大量空白行。这意味着该解决方案不正确,因为它错过了Nixon并显示一个显示256次的空白字符串。但是,空白字符串不是“单词”。

答案 2 :(得分:7)

vim 60

    :1,$!tr " " "\n"|tr -d "\t "|sort|uniq -c|sort -n|tail -n 10

答案 3 :(得分:7)

Vim 36

:%s/\W/\r/g|%!sort|uniq -c|sort|tail

答案 4 :(得分:5)

Haskell,102个字符(哇,非常接近原始字符):

import List
(take 10.map snd.sort.map(\(x:y)->(-length y,x)).group.sort.words)`fmap`readFile"input.txt"

J,只有55个字符:

10{.\:~~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt'

(我还没弄明白如何优雅地在J中执行文本操作...它在数组结构数据方面要好得多。)


   NB. read the file
   <1!:1<'input.txt'
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------...
|    Washington     Washington     Adams     Jefferson     Jefferson     Madison     Madison     Monroe     Monroe     John Quincy Adams     Jackson     Jackson     Van Buren     Harrison DIES     Tyler     Polk     Taylor DIES     Fillmore     Pierce     ...
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------...
   NB. split into lines
   <;._2[1!:1<'input.txt'
+--------------+--------------+---------+-------------+-------------+-----------+-----------+----------+----------+---------------------+-----------+-----------+-------------+-----------------+---------+--------+---------------+------------+----------+----...
|    Washington|    Washington|    Adams|    Jefferson|    Jefferson|    Madison|    Madison|    Monroe|    Monroe|    John Quincy Adams|    Jackson|    Jackson|    Van Buren|    Harrison DIES|    Tyler|    Polk|    Taylor DIES|    Fillmore|    Pierce|    ...
+--------------+--------------+---------+-------------+-------------+-----------+-----------+----------+----------+---------------------+-----------+-----------+-------------+-----------------+---------+--------+---------------+------------+----------+----...
   NB. split into words
   ;;:&.><;._2[1!:1<'input.txt'
+----------+----------+-----+---------+---------+-------+-------+------+------+----+------+-----+-------+-------+---+-----+--------+----+-----+----+------+----+--------+------+--------+-------+-------+----+-------+-----+-----+-----+--------+----+------+---...
|Washington|Washington|Adams|Jefferson|Jefferson|Madison|Madison|Monroe|Monroe|John|Quincy|Adams|Jackson|Jackson|Van|Buren|Harrison|DIES|Tyler|Polk|Taylor|DIES|Fillmore|Pierce|Buchanan|Lincoln|Lincoln|DIES|Johnson|Grant|Grant|Hayes|Garfield|DIES|Arthur|Cle...
+----------+----------+-----+---------+---------+-------+-------+------+------+----+------+-----+-------+-------+---+-----+--------+----+-----+----+------+----+--------+------+--------+-------+-------+----+-------+-----+-----+-----+--------+----+------+---...
   NB. count reptititions
   |:~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt'
+----------+-----+---------+-------+------+----+------+-------+---+-----+--------+----+-----+----+------+--------+------+--------+-------+-------+-----+-----+--------+------+---------+--------+---------+----+------+-------+--------+------+---+------+------...
|2         |2    |2        |2      |2     |1   |1     |2      |1  |1    |2       |6   |1    |1   |1     |1       |1     |1       |2      |3      |2    |1    |1       |1     |2        |2       |2        |1   |2     |1      |1       |1     |4  |2     |2     ...
+----------+-----+---------+-------+------+----+------+-------+---+-----+--------+----+-----+----+------+--------+------+--------+-------+-------+-----+-----+--------+------+---------+--------+---------+----+------+-------+--------+------+---+------+------...
|Washington|Adams|Jefferson|Madison|Monroe|John|Quincy|Jackson|Van|Buren|Harrison|DIES|Tyler|Polk|Taylor|Fillmore|Pierce|Buchanan|Lincoln|Johnson|Grant|Hayes|Garfield|Arthur|Cleveland|McKinley|Roosevelt|Taft|Wilson|Harding|Coolidge|Hoover|FDR|Truman|Eisenh...
+----------+-----+---------+-------+------+----+------+-------+---+-----+--------+----+-----+----+------+--------+------+--------+-------+-------+-----+-----+--------+------+---------+--------+---------+----+------+-------+--------+------+---+------+------...
   NB. sort
   |:\:~~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt'
+----+---+-------+----+------+----------+------+---------+------+-----+------+--------+-------+-------+---------+-------+--------+-----+----------+-------+---------+-----+---+-----+------+----+------+----+------+-----+-------+----+------+-----+-------+----...
|6   |4  |3      |3   |2     |2         |2     |2        |2     |2    |2     |2       |2      |2      |2        |2      |2       |2    |2         |2      |2        |2    |1  |1    |1     |1   |1     |1   |1     |1    |1      |1   |1     |1    |1      |1   ...
+----+---+-------+----+------+----------+------+---------+------+-----+------+--------+-------+-------+---------+-------+--------+-----+----------+-------+---------+-----+---+-----+------+----+------+----+------+-----+-------+----+------+-----+-------+----...
|DIES|FDR|Johnson|Bush|Wilson|Washington|Truman|Roosevelt|Reagan|Nixon|Monroe|McKinley|Madison|Lincoln|Jefferson|Jackson|Harrison|Grant|Eisenhower|Clinton|Cleveland|Adams|Van|Tyler|Taylor|Taft|Quincy|Polk|Pierce|Obama|Kennedy|John|Hoover|Hayes|Harding|Garf...
+----+---+-------+----+------+----------+------+---------+------+-----+------+--------+-------+-------+---------+-------+--------+-----+----------+-------+---------+-----+---+-----+------+----+------+----+------+-----+-------+----+------+-----+-------+----...
   NB. take 10
   10{.\:~~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt'
+-+----------+
|6|DIES      |
+-+----------+
|4|FDR       |
+-+----------+
|3|Johnson   |
+-+----------+
|3|Bush      |
+-+----------+
|2|Wilson    |
+-+----------+
|2|Washington|
+-+----------+
|2|Truman    |
+-+----------+
|2|Roosevelt |
+-+----------+
|2|Reagan    |
+-+----------+
|2|Nixon     |
+-+----------+

答案 5 :(得分:3)

Perl:90

Perl:114 (包括perl,命令行开关,单引号和文件名)

perl -nle'$h{$_}++for split/ /;END{$i++<=10?print"$h{$_} $_":0for reverse sort{$h{$a}cmp$h{$b}}keys%h}' input.txt

答案 6 :(得分:3)

缺乏AWK令人不安。

xargs -n1<input.txt|awk '{c[$1]++}END{for(p in c)print c[p],p|"sort|tail"}'

75个字​​符。

如果你想获得更多的AWKy,你可以忘记xargs:

awk -v RS='[^a-zA-Z]' /./'{c[$1]++}END{for(p in c)print c[p],p|"sort|tail"}' input.txt

答案 7 :(得分:2)

Windows批处理文件

这显然不是最小的解决方案,但无论如何我决定发布它,只是为了好玩。 :)注意:批处理文件使用名为 $ 的临时文件来存储临时结果。

包含评论的原始未压缩版本:

@echo off
setlocal enableextensions enabledelayedexpansion

set infile=%1
set cnt=%2
set tmpfile=$
set knownwords=

rem Calculate word count
for /f "tokens=*" %%i in (%infile%) do (
  for %%w in (%%i) do (

    rem If the word hasn't already been processed, ...
    echo !knownwords! | findstr "\<%%w\>" > nul
    if errorlevel 1 (

      rem Count the number of the word's occurrences and save it to a temp file
      for /f %%n in ('findstr "\<%%w\>" %infile% ^| find /v "" /c') do (
        echo %%n^|%%w >> %tmpfile%
      )

      rem Then add the word to the known words list
      set knownwords=!knownwords! %%w
    )
  )
)

rem Print top 10 word count
for /f %%i in ('sort /r %tmpfile%') do (
  echo %%i
  set /a cnt-=1
  if !cnt!==0 goto end
)

:end
del %tmpfile%

压缩&amp;混淆版, 317 字符:

@echo off&setlocal enableextensions enabledelayedexpansion&set n=%2&set l=
for /f "tokens=*" %%i in (%1)do for %%w in (%%i)do echo !l!|findstr "\<%%w\>">nul||for /f %%n in ('findstr "\<%%w\>" %1^|find /v "" /c')do echo %%n^|%%w>>$&set l=!l! %%w
for /f %%i in ('sort /r $')do echo %%i&set /a n-=1&if !n!==0 del $&exit /b

如果echo已经关闭且命令扩展和延迟变量扩展打开,则可以缩短为 258 个字符:

set n=%2&set l=
for /f "tokens=*" %%i in (%1)do for %%w in (%%i)do echo !l!|findstr "\<%%w\>">nul||for /f %%n in ('findstr "\<%%w\>" %1^|find /v "" /c')do echo %%n^|%%w>>$&set l=!l! %%w
for /f %%i in ('sort /r $')do echo %%i&set /a n-=1&if !n!==0 del $&exit /b

用法:

> filename.bat input.txt 10 & pause

输出:

6|DIES
4|FDR
3|Johnson
3|Bush
2|Wilson
2|Washington
2|Truman
2|Roosevelt
2|Reagan
2|Nixon

答案 8 :(得分:2)

<强>红宝石

115个字符

w = File.read($*[0]).split
w.uniq.map{|x| [w.select{|y|x==y}.size,x]}.sort.last(10).each{|z| puts "#{z[1]} #{z[0]}"}

答案 9 :(得分:2)

Ruby 66B

puts (a=$<.read.split).uniq.map{|x|"#{a.count x} "+x}.sort.last 10

答案 10 :(得分:2)

Perl 86个字符

94,如果计算输入文件名。

perl -anE'$_{$_}++for@F;END{say"$_{$_} $_"for@{[sort{$_{$b}<=>$_{$a}}keys%_]}[0..10]}' test.in

如果你不关心你得到多少结果,那么它只有75,不包括文件名。

perl -anE'$_{$_}++for@F;END{say"$_{$_} $_"for sort{$_{$b}<=>$_{$a}}keys%_}' test.in

答案 11 :(得分:2)

previous entry上的修订版应保存10个字符:

h = {}
File.open('f.1').each {|l|l.split(/ /).each{|e|h[e]==nil ?h[e]=1:h[e]+=1}}
h.sort{|a,b|a[1]<=>b[1]}.last(10).each{|e|puts"#{e[1]} #{e[0]}"}

答案 12 :(得分:2)

vim 38 ,适用于所有输入

:%!xargs -n1|sort|uniq -c|sort -n|tail

答案 13 :(得分:2)

Python 2.6, 104 字符:

l=open("input.txt").read().split()
for c,n in sorted(set((l.count(w),w) for w in l if w))[-10:]:print c,n

答案 14 :(得分:2)

python 3.1(88个字符)

import collections
collections.Counter(open('input.txt').read().split()).most_common(10)

答案 15 :(得分:2)

到目前为止,我最好尝试使用红宝石,166个字符:

h = Hash.new
File.open('f.l').each_line{|l|l.split(/ /).each{|e|h[e]==nil ?h[e]=1:h[e]+=1}}
h.sort{|a,b|a[1]<=>b[1]}.last(10).each{|e|puts"#{e[1]} #{e[0]}"}

我很惊讶没有人发布疯狂的J解决方案。

答案 16 :(得分:2)

这是shell脚本的压缩版本,观察到对输入数据(无前导或尾随空白)的合理解释,原始中的第二个'tr'和'sed'命令不会更改数据(通过在适当的位置插入'tee out.N'并检查输出文件大小来验证 - 相同)。 shell需要的空间比人类少 - 并且使用cat而不是输入I / O重定向会浪费空间。

tr \  \\n<input.txt|sort|uniq -c|sort -n|tail -10

这个重量为50个字符,包括脚本末尾的换行符。

还有两个观察结果(取自其他人的答案):

  1. tail本身相当于“tail -10”和
  2. 在这种情况下,数字和字母排序是等价的,
  3. 这可以通过另外7个字符缩小(到43包括尾随换行符):

    tr \  \\n<input.txt|sort|uniq -c|sort|tail
    

    使用'xargs -n1'(没有给出命令前缀)代替'tr'是非常聪明的;它处理前导,尾随和多个嵌入空间(这个解决方案没有)。