领先的空间和资本化

时间:2017-12-31 02:30:47

标签: python regex

如何从段落的开头删除单个空格并使用python大写段落的第一个字母?

输入:

 this is a sample sentence. This is a sample second sentence.

输出:

This is a sample sentence. This is a sample second sentence.

到目前为止我的努力:

import spacy, re
nlp = spacy.load('en_core_web_sm')
doc = nlp(unicode(open('2.txt').read().decode('utf8')) )
tagged_sent = [(w.text, w.tag_) for w in doc]
normalized_sent = [w.capitalize() if t in ["NN","NNS"] else w for (w,t) in tagged_sent]
normalized_sent1 = normalized_sent[0].capitalize()
string = re.sub(" (?=[\.,'!?:;])", "", ' '.join(normalized_sent1))
rtn = re.split('([.!?] *)', string)
final = ''.join([i.capitalize() for i in rtn])
print final

除了段落开头之外,这里所有段落的句子的第一个词都是大写的吗?

Output:
 on the insert tab,  the galleries include items that are designed to coordinate with the overall look of your document. You can use these galleries to insert tables, headers,  footers,  lists,  cover pages,  and other document building blocks. When you create pictures,  charts,  or diagrams,  they also coordinate with your current document look. 

Expected output:
On the insert tab,  the galleries include items that are designed to coordinate with the overall look of your document. You can use these galleries to insert tables, headers,  footers,  lists,  cover pages,  and other document building blocks. When you create pictures,  charts,  or diagrams,  they also coordinate with your current document look. 

3 个答案:

答案 0 :(得分:2)

您可以使用正则表达式和str.capitalize()

import re
s = " this is a sample sentence. This is a sample second sentence."
new_s = '. '.join(i.capitalize() for i in re.split('\.\s', re.sub('^\s+', '', s)))

输出:

'This is a sample sentence. This is a sample second sentence.'

答案 1 :(得分:1)

一个简单的解决方案是,(我推荐@Ajax'答案)

x = 'on the insert tab,  the galleries include items that are designed to coordinate with the overall look of your document. You can use these galleries to insert tables, headers,  footers,  lists,  cover pages,  and other document building blocks. When you create pictures,  charts,  or diagrams,  they also coordinate with your current document look. '
print( '. '.join(map(lambda s: s.strip().capitalize(), x.split('.'))))

输出:

On the insert tab,  the galleries include items that are designed to coordinate with the overall look of your document. You can use these galleries to insert tables, headers,  footers,  lists,  cover pages,  and other document building blocks. When you create pictures,  charts,  or diagrams,  they also coordinate with your current document look.

答案 2 :(得分:1)

如果你的要求只是删除第一个空格,然后制作首字母大写你可以尝试这样的事情:

your_data='  on the insert tab,  the galleries include items that are designed to coordinate with the overall look of your document. you can use these galleries to insert tables, headers,  footers,  lists,  cover pages,  and other document building blocks. when you create pictures,  charts,  or diagrams,  they also coordinate with your current document look. '
conversion=list(your_data)
if conversion[0]==' ':
    del conversion[0]

capitalize="".join(conversion).split()
for j,i in enumerate(capitalize):
    try:
        if j==0:
            capitalize[j]=capitalize[j].capitalize()

        if '.' in i:
            capitalize[j + 1] = capitalize[j + 1].capitalize()
    except IndexError:
        pass

print(" ".join(capitalize))

输出:

On the insert tab, the galleries include items that are designed to coordinate with the overall look of your document. You can use these galleries to insert tables, headers, footers, lists, cover pages, and other document building blocks. When you create pictures, charts, or diagrams, they also coordinate with your current document look.