我有一个未经说明的文本,例如
randstr='''
Hello Abou al Reeem
HAbibi abou lbeess
dkhlak 3amak l khil wen
Li aslan 7adan be3rif shi 3an l bazz ?
Sara7a wala 7adan by3rif 3an l ken3an
Grave is 22 and Tony is 15
Rami is 44 and Aya is 40
'''
你可以看到第一段由句子组成,由\n
不同的行分开。我想要的是在python中进行编码,以便我最终可以使用
list='''
Hello Abou al Reeem HAbibi abou lbeess dkhlak 3amak l khil wen Li aslan 7adan be3rif shi 3an l bazz ? Sara7a wala 7adan by3rif 3an l ken3an
Grave is 22 and Tony is 15 Rami is 44 and Aya is 40
'''
我用我的python代码成功获得了学位
import re
randstr='''
Hello Abou al Reeem
HAbibi abou lbeess
dkhlak 3amak l khil wen
Li aslan 7adan be3rif shi 3an l bazz ?
Sara7a wala 7adan by3rif 3an l ken3an
Grave is 22 and Tony is 15
Rami is 44 and Aya is 40
'''
# Split the string based on empty lines (Note: I tried ^\s*$ but it did not work)
A=randstr.split('\n\n')
# Split each of the elements in A to one sentence without the \n using regular expression substitution
regex=re.compile('\n')
for i in range(len(A)):
A[i]=regex.sub(' ',A[i])
A
但我想知道它是否可以更容易,如果可以在Sublime文本3而不是python中这样做?
注意:我是正则表达式的新手
答案 0 :(得分:2)
这里根本不需要正则表达式。拆分两个换行符,然后连续拆分并加入每个段:
print(
'\n'.join(
[' '.join(para.splitlines()) for para in randstr.split('\n\n')]
)
)
答案 1 :(得分:1)
您也可以使用正则表达式替换:
import re
randstr='''
Hello Abou al Reeem
HAbibi abou lbeess
dkhlak 3amak l khil wen
Li aslan 7adan be3rif shi 3an l bazz ?
Sara7a wala 7adan by3rif 3an l ken3an
Grave is 22 and Tony is 15
Rami is 44 and Aya is 40
'''
newstr = re.sub("\n","\n\n", re.sub(r'\n(?! *\n)','',randstr))
print (newstr)
输出:
Hello Abou al ReeemHAbibi abou lbeessdkhlak 3amak l khil wenLi aslan 7adan be3rif shi 3an l bazz ? Sara7a wala 7adan by3rif 3an l ken3an
Grave is 22 and Tony is 15Rami is 44 and Aya is 40
它基本上与COLDSPEEDs解决方案相同,首先代替
'\n'
后面没有可选空格和'\n'
没有任何内容然后将所有剩余的'\n'
替换为'\n\n'
r' \ n(?!* \ n)' #(?!.....)是一个负面的前瞻 - 之后无法匹配的东西