我有一个脚本,我从文件中读取并从行中的每个单词中取出前两个字符,我想要做的是找出最常出现的两个字母,我是否必须转换我的将输出输出到列表并按此方式执行?
这是我的
#!/usr/bin/python
import string
import re
import random
import sys
file = raw_input("Enter path to filename :")
text_file= open(file,'r')
data=text_file.readlines()
firsttwo =[]
lines = []
def first2():
for line in data:
firsttwo = line[:2]
print firsttwo
print first2()
答案 0 :(得分:0)
您可以使用Counter
来计算列表中项目的外观。
from collections import Counter
text_file= open("C:/test.txt",'r')
firsttwo = [line[:2] for line in text_file.readlines()]
print Counter(firsttwo)
如果test.txt
的内容是:
first line
second line
second line
third line
提供的代码输出为:
Counter({'se': 2, 'fi': 1, 'th': 1})
如果要将此输出转换为列表,可以执行以下操作:
list(Counter(firsttwo).items())
输出:
[('fi', 1), ('th', 1), ('se', 2)]
编辑(没有收藏):
text_file= open("C:/test.txt",'r')
firsttwo = [line[:2] for line in text_file.readlines()]
l_items = set(firsttwo)
l_counts = [(firsttwo.count(x), x) for x in set(firsttwo)]
l_counts.sort(reverse=True)
print l_counts[0][1]
答案 1 :(得分:0)
要构建初始字符串,请使用生成器理解和join()
:
In [49]: mystring="".join(line[:2] for line in data)
这可以使用count()
对象的str
方法解决:
In [50]: mystring="helloworld"
In [51]: mystring.count("o")
Out[51]: 2
如果您希望最常见的项目使用sorted
和string.ascii_letters
:
In [52]: from string import ascii_letters as letters
In [71]: mystring
Out[71]: "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. "
In [72]: sorted((mystring.count(l),l) for l in letters)[:-5:-1]
Out[72]: [(23, 'e'), (20, 't'), (17, 'n'), (14, 's')]
答案 2 :(得分:0)
我就是这样做的:
import re
import collections
from collections import Counter
my_file = open("text.txt", 'r')
lines_from_file = my_file.readlines()
first_two_letters = " ".join(item[:2].upper() for item in re.findall("\w+", str(lines_from_file)))
processed_letters = first_two_letters.split()
resulting_count = collections.Counter(processed_letters)
print resulting_count
这可能不是最好的方式,但是:
collections
计数器会计算每组字母