使用正则表达式在字符串中排列未格式化的段落

时间:2018-03-09 16:34:25

标签: python regex sublimetext3

我有一个未经说明的文本,例如

randstr='''
Hello Abou al Reeem
HAbibi abou lbeess
dkhlak 3amak l khil wen
Li aslan 7adan be3rif shi 3an l bazz ? 
Sara7a wala 7adan by3rif 3an l ken3an

Grave is 22 and Tony is 15
Rami is 44 and Aya is 40
'''

你可以看到第一段由句子组成,由\n不同的行分开。我想要的是在python中进行编码,以便我最终可以使用

list='''
 Hello Abou al Reeem HAbibi abou lbeess dkhlak 3amak l khil wen Li aslan 7adan be3rif shi 3an l bazz ?  Sara7a wala 7adan by3rif 3an l ken3an

 Grave is 22 and Tony is 15 Rami is 44 and Aya is 40
'''

我用我的python代码成功获得了学位

import re
randstr='''
Hello Abou al Reeem
HAbibi abou lbeess
dkhlak 3amak l khil wen
Li aslan 7adan be3rif shi 3an l bazz ? 
Sara7a wala 7adan by3rif 3an l ken3an

Grave is 22 and Tony is 15
Rami is 44 and Aya is 40
'''

# Split the string based on empty lines (Note: I tried ^\s*$ but it did not work)
A=randstr.split('\n\n')

# Split each of the elements in A to one sentence without the \n using regular expression substitution
regex=re.compile('\n')

for i in range(len(A)):
    A[i]=regex.sub(' ',A[i])

A

但我想知道它是否可以更容易,如果可以在Sublime文本3而不是python中这样做?

注意:我是正则表达式的新手

2 个答案:

答案 0 :(得分:2)

这里根本不需要正则表达式。拆分两个换行符,然后连续拆分并加入每个段:

print(
    '\n'.join(
        [' '.join(para.splitlines()) for para in randstr.split('\n\n')]
    )
)

答案 1 :(得分:1)

您也可以使用正则表达式替换:

import re

randstr='''
Hello Abou al Reeem
HAbibi abou lbeess
dkhlak 3amak l khil wen
Li aslan 7adan be3rif shi 3an l bazz ? 
Sara7a wala 7adan by3rif 3an l ken3an

Grave is 22 and Tony is 15
Rami is 44 and Aya is 40
'''

newstr = re.sub("\n","\n\n", re.sub(r'\n(?! *\n)','',randstr)) 

print (newstr)

输出:

Hello Abou al ReeemHAbibi abou lbeessdkhlak 3amak l khil wenLi aslan 7adan be3rif shi 3an l bazz ? Sara7a wala 7adan by3rif 3an l ken3an

Grave is 22 and Tony is 15Rami is 44 and Aya is 40

它基本上与COLDSPEEDs解决方案相同,首先代替

  • '\n'后面没有可选空格和'\n'没有任何内容
  • 然后将所有剩余的'\n'替换为'\n\n'

    r' \ n(?!* \ n)' #(?!.....)是一个负面的前瞻 - 之后无法匹配的东西