需要编写一个Python文件来分析文件并计数:
我有执行前两件事的代码:
with open(input('Please enter the full name of the file: '),'r') as f:
w = [len(word) for line in f for word in line.rstrip().split(" ")]
total_w = len(w)
avg_w = sum(w)/total_w
print('The total number of words in this file is:', total_w)
print('The average length of the words in this file is:', avg_w)
但是我不确定其他方法。任何帮助表示赞赏。
顺便说一句,当我说“有多少个单词以字母表中的每个字母开头”时,我是指有多少个单词以“ A”开头,有多少个以“ B”开头,有多少个以“ C”开头等等。到达“ Z”的方式。
答案 0 :(得分:0)
给了您有趣的挑战,我对问题3提出了一个建议,即单词在字符串中出现了多少次。这段代码根本不是最优的,但是确实可以。
我还使用了文件
{ "AWSTemplateFormatVersion": "2010-09-09", "Description": "stack 1", "Parameters": {}, "Resources": { "MyPolicy": { "Type": "AWS::IAM::Policy", "Properties": { "PolicyDocument": { "Statement": [{ "Action": "sqs:*", "Effect": "Allow", "Resource": { "Fn::GetAtt": ["MyQueue", "Arn"] } }], "Version": "2012-10-17" }, "PolicyName": "MyPolicyName", "Roles": [{ "Ref": "MyRole" }] } }, "MyRole": { "Type": "AWS::IAM::Role", "Properties": { "AssumeRolePolicyDocument": { "Statement": [{ "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "Service": ["events.amazonaws.com", "sqs.amazonaws.com"] } }], "Version": "2012-10-17" } } }, "MyQueue": { "Type": "AWS::SQS::Queue", "Properties": { "QueueName": "MyQueue2" } }, "MyRule": { "Type": "AWS::Events::Rule", "Properties": { "Description": "A rule to schedule data update", "Name": "MyRule", "ScheduleExpression": "rate(1 minute)", "State": "ENABLED", "RoleArn": { "Fn::GetAtt": ["MyRole", "Arn"] }, "Targets": [{ "Arn": { "Fn::GetAtt": ["MyQueue", "Arn"] }, "Id": "MyRule1", "Input": "{\"a\":\"b\"}" }] } }, "MyQueuePolicy": { "DependsOn": ["MyQueue", "MyRule"], "Type": "AWS::SQS::QueuePolicy", "Properties": { "PolicyDocument": { "Version": "2012-10-17", "Id": "MyQueuePolicy", "Statement": [{ "Effect": "Allow", "Principal": { "Service": ["events.amazonaws.com", "sqs.amazonaws.com"] }, "Action": "sqs:SendMessage", "Resource": { "Fn::GetAtt": ["MyQueue", "Arn"] } }] }, "Queues": [{ "Ref": "MyQueue" }] } } }, "Outputs": { } }
编辑:注意到我忘了创建单词表,因为它已保存在内存中
text.txt
问题四的答案:创建包含所有单词的列表后,这并不是很困难,因为可以将字符串视为列表,并且只需执行{{1 }},以及包含字符串
的列表with open('text.txt', 'r') as doc: print('opened txt') for words in doc: wordlist = words.split() for numbers in range(len(wordlist)): for inner_numbers in range(len(wordlist)): if inner_numbers != numbers: if wordlist[numbers] == wordlist[inner_numbers]: print('word: %s == %s' %(wordlist[numbers], wordlist[inner_numbers]))
string[0]
答案 1 :(得分:0)
有很多方法可以实现此目的,一种更高级的方法是首先简单地收集文本和单词,然后使用ML / DS工具处理数据,然后您可以推断出更多的统计信息(例如“一个新的段落主要以X词开头” /“ X词大多在Y词之前/之后”等。)
如果您只需要非常基本的统计信息,则可以在遍历每个单词的同时收集它们并在其末尾进行计算,例如:
stats = {
'amount': 0,
'length': 0,
'word_count': {},
'initial_count': {}
}
with open('lorem.txt', 'r') as f:
for line in f:
line = line.strip()
if not line:
continue
for word in line.split():
word = word.lower()
initial = word[0]
# Add word and length count
stats['amount'] += 1
stats['length'] += len(word)
# Add initial count
if not initial in stats['initial_count']:
stats['initial_count'][initial] = 0
stats['initial_count'][initial] += 1
# Add word count
if not word in stats['word_count']:
stats['word_count'][word] = 0
stats['word_count'][word] += 1
# Calculate average word length
stats['average_length'] = stats['length'] / stats['amount']
在线演示here