我已经获得了一个基本的文本文件,我需要在python中使用正则表达式来拉取每一行的所有单词并打印每行的单词数。
文本文件示例:
I have a dog.
She is small and cute,
and likes to play with other dogs.
示例输出:
Line 1: 4
Line 2: 5
Line 3: 7
任何帮助将不胜感激!
答案 0 :(得分:1)
您可以尝试拆分行
with open('input_file_name.txt') as input_file:
line_number = 1
for line in input_file.readlines():
print( 'Line {} : {}'.format(line_number,len(line.split(' '))))
line_number +=1
答案 1 :(得分:0)
f = open(path_to_text_file, "r")
counter = 1
for line in f.readlines(): # read the file line by line
print "Line %d: %d" % (counter, len(line.split(" "))) # counts the spaces, assuming that there is only one space between words.
counter += 1
答案 2 :(得分:0)
默认情况下,您可以尝试使用awk分割白色空间:
cat <<EOT | awk '{print NF}'
> I have a dog.
> She is small and cute,
> and likes to play with other dogs.
> EOT
4
5
7
NF是一个awk变量,始终设置为当前记录中的字段数。
答案 3 :(得分:0)
这个非常直观的正则表达式可能有所帮助:
\b\w+\b
它匹配单词边界之间的所有单词字符。你只需要计算有多少匹配。
如果要将带有连字符(或任何其他字符)的单词计为1个单词,请将-
添加到字符集中:
\b[\w\-]\b
或
\b[\w\-'.]\b
等
你明白了。