如何使用正则表达式在多行文本文件中打印每行的长度?

时间:2017-11-21 17:41:27

标签: python regex multiline

我已经获得了一个基本的文本文件,我需要在python中使用正则表达式来拉取每一行的所有单词并打印每行的单词数。

文本文件示例:

I have a dog.
She is small and cute,
and likes to play with other dogs.

示例输出:

Line 1: 4
Line 2: 5
Line 3: 7

任何帮助将不胜感激!

4 个答案:

答案 0 :(得分:1)

您可以尝试拆分行

with open('input_file_name.txt') as input_file:
line_number = 1
for line in input_file.readlines():
    print( 'Line {} : {}'.format(line_number,len(line.split(' '))))
    line_number +=1

答案 1 :(得分:0)

f = open(path_to_text_file, "r") 
counter = 1
for line in f.readlines():  # read the file line by line
    print "Line %d: %d" % (counter, len(line.split(" ")))  # counts the spaces, assuming that there is only one space between words.
    counter += 1

答案 2 :(得分:0)

默认情况下,您可以尝试使用awk分割白色空间:

cat <<EOT | awk '{print NF}'
> I have a dog.
> She is small and cute,
> and likes to play with other dogs.
> EOT
4
5
7

NF是一个awk变量,始终设置为当前记录中的字段数。

答案 3 :(得分:0)

这个非常直观的正则表达式可能有所帮助:

\b\w+\b

它匹配单词边界之间的所有单词字符。你只需要计算有多少匹配。

如果要将带有连字符(或任何其他字符)的单词计为1个单词,请将-添加到字符集中:

\b[\w\-]\b

\b[\w\-'.]\b

你明白了。