寻找.txt单词频率列表来测试程序

时间:2009-05-14 02:41:44

标签: text-files frequency word-frequency

我想要一个200-1000左右的英语最常用词汇的文件。我已经能够找到20万个单词或其他任何内容的荒谬名单,但没有更少使用更常用的单词。

最好这些单词每行一个,但如果不是,那么我可以格式化它。

谢谢!

4 个答案:

答案 0 :(得分:1)

我在谷歌搜索“按频率划分的英语单词”,并找到了许多好消息来源。这是一个on wiktionary.org

答案 1 :(得分:1)

Here是前500名。您可以从HTML中删除列表。

答案 2 :(得分:0)

可以写一个简单的解决方案,这是未经测试但应该99%好。

<?php
$fh = fopen('http://domain.tld/path/tofile.txt', 'r');
$wordList = array();
for($i=0;$i<100;$i++)
    $wordList[] = fread($fh, 1024);
print_r($wordList);
?>

答案 3 :(得分:0)

这是来自McWafflestix链接的前250名(你强调的更少更多),直接向上,没有多余的空间等等,这要归功于emacs中的kill-rectangle。我不得不说,这是一个非常微不足道的非编程相关的。

the
of
to
and
a
in
is
it
you
that
he
was
for
on
are
with
as
I
his
they
be
at
one
have
this
from
or
had
by
hot
but
some
what
there
we
can
out
other
were
all
your
when
up
use
word
how
said
an
each
she
which
do
their
time
if
will
way
about
many
then
them
would
write
like
so
these
her
long
make
thing
see
him
two
has
look
more
day
could
go
come
did
my
sound
no
most
number
who
over
know
water
than
call
first
people
may
down
side
been
now
find
any
new
work
part
take
get
place
made
live
where
after
back
little
only
round
man
year
came
show
every
good
me
give
our
under
name
very
through
just
form
much
great
think
say
help
low
line
before
turn
cause
same
mean
differ
move
right
boy
old
too
does
tell
sentence
set
three
want
air
well
also
play
small
end
put
home
read
hand
port
large
spell
add
even
land
here
must
big
high
such
follow
act
why
ask
men
change
went
light
kind
off
need
house
picture
try
us
again
animal
point
mother
world
near
build
self
earth
father
head
stand
own
page
should
country
found
answer
school
grow
study
still
learn
plant
cover
food
sun
four
thought
let
keep
eye
never
last
door
between
city
tree
cross
since
hard
start
might
story
saw
far
sea
draw
left
late
run
don't
while
press
close
night
real
life
few
stop