查找所选字母出现的次数

时间:2015-11-10 13:29:05

标签: python file-io

我有一个脚本,我从文件中读取并从行中的每个单词中取出前两个字符,我想要做的是找出最常出现的两个字母,我是否必须转换我的将输出输出到列表并按此方式执行?

这是我的

#!/usr/bin/python

import string
import re
import random
import sys


file = raw_input("Enter path to filename :")

text_file= open(file,'r')
data=text_file.readlines()
firsttwo =[]
lines = []

def first2():
    for line in data:   
    firsttwo = line[:2]
    print firsttwo

print first2()

3 个答案:

答案 0 :(得分:0)

您可以使用Counter来计算列表中项目的外观。

from collections import Counter

text_file= open("C:/test.txt",'r')
firsttwo = [line[:2] for line in text_file.readlines()]

print Counter(firsttwo)

如果test.txt的内容是:

first line
second line
second line
third line

提供的代码输出为:

Counter({'se': 2, 'fi': 1, 'th': 1})

如果要将此输出转换为列表,可以执行以下操作:

list(Counter(firsttwo).items())

输出:

[('fi', 1), ('th', 1), ('se', 2)]

编辑(没有收藏):

text_file= open("C:/test.txt",'r')
firsttwo = [line[:2] for line in text_file.readlines()]
l_items = set(firsttwo) 
l_counts = [(firsttwo.count(x), x) for x in set(firsttwo)]
l_counts.sort(reverse=True)
print l_counts[0][1]

答案 1 :(得分:0)

要构建初始字符串,请使用生成器理解和join()

In [49]: mystring="".join(line[:2] for line in data)

这可以使用count()对象的str方法解决:

In [50]: mystring="helloworld"

In [51]: mystring.count("o")
Out[51]: 2

如果您希望最常见的项目使用sortedstring.ascii_letters

In [52]: from string import ascii_letters as letters
In [71]: mystring
Out[71]: "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. "

In [72]: sorted((mystring.count(l),l) for l in letters)[:-5:-1]
Out[72]: [(23, 'e'), (20, 't'), (17, 'n'), (14, 's')]

答案 2 :(得分:0)

我就是这样做的:

import re
import collections
from collections import Counter

my_file = open("text.txt", 'r')
lines_from_file = my_file.readlines()
first_two_letters = " ".join(item[:2].upper() for item in re.findall("\w+", str(lines_from_file)))

processed_letters = first_two_letters.split()

resulting_count = collections.Counter(processed_letters)

print resulting_count

这可能不是最好的方式,但是:

  • 正在阅读文件
  • 存储每个单词的前两个字母
  • 使用collections计数器会计算每组字母