如何大写文本文件中的某些单词?

时间:2012-07-26 17:25:15

标签: python formatting python-2.7 text-manipulation

我有一个普通句子的文本文件。实际上我在输入该文件时很着急所以我只是将句子第一个单词的第一个字母大写(按照英语语法)。

但现在我希望如果每个单词的第一个字母大写,那就更好了。类似的东西:

  

这句话的每个词都是大写的

在上面的句子中要注意的是 的大写不是大写的,实际上我想要逃避等于或小于<的单词强> 3 字母。

我该怎么办?

5 个答案:

答案 0 :(得分:5)

for line in text_file:
    print ' '.join(word.title() if len(word) > 3 else word for word in line.split())

修改:要忽略计数标点符号,请使用以下函数替换len

def letterlen(s):
    return sum(c.isalpha() for c in s)

答案 1 :(得分:4)

看看NLTK

对每个单词进行标记,并将其大写。诸如'if','of'之类的词被称为'停用词'。如果你的标准只是长度,史蒂文的答案是一个很好的方法。如果您想要查找停用词,SO中会出现类似的问题:How to remove stop words using nltk or python

答案 2 :(得分:3)

您应该拆分单词,并仅使用超过三个字母的单词。

words.txt

each word of this sentence is capitalized
some more words
an other line

-

import string


with open('words.txt') as file:
    # List to store the capitalised lines.
    lines = []
    for line in file:
        # Split words by spaces.
        words = line.split(' ')
        for i, word in enumerate(words):
            if len(word.strip(string.punctuation + string.whitespace)) > 3:
                # Capitalise and replace words longer than 3 (without punctuation).
                words[i] = word.capitalize()
        # Join the capitalised words with spaces.
        lines.append(' '.join(words))
    # Join the capitalised lines.
    capitalised = ''.join(lines)

# Optionally, write the capitalised words back to the file.
with open('words.txt', 'w') as file:
    file.write(capitalised)

答案 3 :(得分:1)

您真正想要的是名为stop words的列表。如果没有此列表,您可以自己构建一个并执行此操作:

skipWords = set("of is".split())
punctuation = '.,<>{}][()\'"/\\?!@#$%^&*' # and any other punctuation that you want to strip out
answer = ""

with open('filepath') as f:
    for line in f:
        for word in line.split():
            for p in punctuation:
                # you end up losing the punctuation in the outpt. But this is easy to fix if you really care about it
                word = word.replace(p, '')  
            if word not in skipwords:
                answer += word.title() + " "
            else:
                answer += word + " "
    return answer # or you can write it to file continuously

答案 4 :(得分:0)

您可以将文本文件中的所有元素添加到列表中:

list = []
f.open('textdocument'.txt)
for elm in f (or text document, I\'m too tired):
   list.append(elm)

一旦列表中包含了所有元素,就运行一个for循环来检查每个元素的长度,如果它大于3则返回第一个元素大写

new_list = []
for items in list:
   if len(item) > 3:
      item.title()    (might wanna check if this works in this case)
      new_list.append(item)
   else:
   new_list.append(item)    #doesn't change words smaller than three words, just adds them to the new list

看看是否有效?