我希望能够计算文本文件中给定单词(例如输入)的出现次数。我有这个代码,它让我发现文件中的所有单词:
word_count = {}
my_word = id
File.open("texte.txt", "r") do |f|
f.each_line do |line|
words = line.split(' ').each do |word|
word_count[word] += 1 if word_count.has_key? my_word
word_count[word] = 1 if not word_count.has_key? my_word
end
end
end
puts "\n"+ word_count.to_s
谢谢
答案 0 :(得分:3)
创建测试文件
让我们首先创建一个可以使用的文件。
text =<<-BITTER_END
It was the best of times, it was the worst of times, it was the age of wisdom,
it was the age of foolishness, it was the epoch of belief, it was the epoch of
incredulity, it was the season of Light, it was the season of Darkness, it was
the spring of hope, it was the winter of despair, we had everything before us,
we had nothing before us...
BITTER_END
FName = 'texte.txt'
File.write(FName, text)
#=> 344
指定要计算的字词
target = 'the'
创建正则表达式
r = /\b#{target}\b/i
#=> /\bthe\b/i
单词分隔\b
用于确保'anthem'
不计入'the'
。
Gulp小文件
如果在这里,文件不是很大,你可以吞下它:
File.read("texte.txt").scan(r).count
#=> 10
逐行阅读大文件
如果文件太大而我们想逐行阅读,请执行以下操作。
File.foreach(FName).reduce(0) { |cnt, line| cnt + line.scan(r).count }
#=> 10
或
File.foreach(FName).sum { |line| line.scan(r).count }
#=> 10
注意Enumerable#sum在Ruby v2.4中首次亮相。
请参阅IO::read和IO::foreach。 (IO.methodx...
通常是File.methodx...
。这是允许的,因为File
是IO
的子类;即File < IO #=> true
。)
使用gsub避免创建临时数组
第一种方法(吞咽文件)创建一个临时数组:
["the", "the", "the", "the", "the", "the", "the", "the", "the", "the"]
应用count
(aka size
)。避免创建此数组的一种方法是使用String#gsub而不是String#scan,因为前者在没有块的情况下使用时会返回一个枚举器:
File.read("texte.txt").gsub(r).count
#=> 10
这也可以用于文件的每一行。
gsub
答案 1 :(得分:0)
如果您只想获取特定单词的计数,则无需使用DB_HOST=postgres
DB_USER=user
DB_PASS=pass
DB_NAME=mydb
,例如:
version: '2'
services:
app:
build: .
volumes:
- ./:/server/http
ports:
- "80:8080"
links:
- postgres
- mongodb
- redis
environment:
DEBUG: 'true'
PORT: '8080'
env_file:
- docker.env
postgres:
image: onjin/alpine-postgres:9.5
restart: unless-stopped
ports:
- "5432:5432"
environment:
LC_ALL: C.UTF-8
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
POSTGRES_DB: mydb
mongodb:
image: mvertes/alpine-mongo:3.2.3
restart: unless-stopped
ports:
- "27017:27017"
redis:
image: sickp/alpine-redis:3.2.2
restart: unless-stopped
ports:
- "6379:6379"
Hash
将包含word_count = 0
my_word = "input"
File.open("texte.txt", "r") do |f|
f.each_line do |line|
line.split(' ').each do |word|
word_count += 1 if word == my_word
end
end
end
puts "\n" + word_count.to_s
的总出现次数。
另一方面,如果您想要保留所有单词的计数,然后只打印特定单词的计数,那么您可以使用word_count
,但尝试这样的事情:
my_word
Hash
将包含与总出现次数匹配的所有单词(单词为word_count = Hash.new(0)
my_word = "input"
File.open("texte.txt", "r") do |f|
f.each_line do |line|
line.split(' ').each do |word|
word_count[word] += 1
end
end
end
puts "\n" + word_count[my_word].to_s
的键并出现其值);要打印word_count
的出现次数,您只需要使用Hash
作为密钥获取哈希值。