我已经清理了一份文件,允许我按照诗句正确地翻录它。在正则表达式中弱,我似乎无法找到正确的表达来提取这些经文。
这是我正在使用的表达方式:
(\t?\t?{\d+}.*){
我在python中这样做,虽然我希望这没关系。
我应该如何改变这一点,使其只是突出显示{x} some verse {x} next verse
经文,但只是缩短下一个括号?
正如你所看到的,我试图让它保持标签识别,因为这个文档给了一些注意诗歌风格的写作。
这是一个示例文档:
{1} The words of the blessing of Enoch, wherewith he blessed the elect [[[[and]]]] righteous, who will be living in the day of tribulation, when all the wicked [[[[and godless]]]] are to be removed. {2} And he took up his parable and said--Enoch a righteous man, whose eyes were opened by God, saw the vision of the Holy One in the heavens, [[which]] the angels showed me, and from them I heard everything, and from them I understood as I saw, but not for this generation, but for a remote one which is for to come. {3} Concerning the elect I said, and took up my parable concerning them:
The Holy Great One will come forth from His dwelling,
{4} And the eternal God will tread upon the earth, (even) on Mount Sinai,
[[And appear from His camp]]
And appear in the strength of His might from the heaven of heavens.
{5} And all shall be smitten with fear
And the Watchers shall quake,
And great fear and trembling shall seize them unto the ends of the earth.
{6} And the high mountains shall be shaken,
And the high hills shall be made low,
And shall melt like wax before the flame
{7} And the earth shall be [[wholly]] rent in sunder,
And all that is upon the earth shall perish,
And there shall be a judgement upon all (men).
{8} But with the righteous He will make peace.
And will protect the elect,
And mercy shall be upon them.
And they shall all belong to God,
And they shall be prospered,
And they shall [[all]] be blessed.
[[And He will help them all]],
And light shall appear unto them,
[[And He will make peace with them]].
{9} And behold! He cometh with ten thousands of [[His]] holy ones
To execute judgement upon all,
And to destroy [[all]] the ungodly:
And to convict all flesh
Of all the works [[of their ungodliness]] which they have ungodly committed,
And of all the hard things which ungodly sinners [[have spoken]] against Him.
[BREAK]
[CHAPTER 2]
答案 0 :(得分:1)
只需使用re.split
import re
text = '''{1} The words of the blessing of Enoch, wherewith he blessed the elect [[[[and]]]] righteous, who will be living in the day of tribulation, when all the wicked [[[[and godless]]]] are to be removed. {2} And he took up his parable and said--Enoch a righteous man, whose eyes were opened by God, saw the vision of the Holy One in the heavens, [[which]] the angels showed me, and from them I heard everything, and from them I understood as I saw, but not for this generation, but for a remote one which is for to come. {3} Concerning the elect I said, and took up my parable concerning them:
The Holy Great One will come forth from His dwelling,
{4} And the eternal God will tread upon the earth, (even) on Mount Sinai,
[[And appear from His camp]]
And appear in the strength of His might from the heaven of heavens.'''
result = [i for i in re.split(r'\{\d+\}', text) if i]
result
有四个元素,对应上面的{1}
到{4}
。
答案 1 :(得分:1)
mv
参见演示。
https://regex101.com/r/OCpDb7/1
编辑:
如果你想捕捉最后一节,请使用
(\t?\t?{\d+}.*?)(?={)
参见演示。
https://regex101.com/r/OCpDb7/2
您的原始正则表达式遇到(\t?\t?{\d+}.*?)(?={|\[BREAK\])
个问题。
2
1)您使用了(\t?\t?{\d+}.*){
^ ^
运营商。使用greedy
non greedy
2)你正在捕捉.*?
,因为它已被捕获,所以不允许该经文匹配。使用{
来断言而不是捕获。
答案 2 :(得分:0)
上面的答案是好的,但是这本经文中的经文并不总是正确递增(也就是说,由于手稿的细节,它可以从第5节跳到第7节)所以我不得不把这些经文保留到"采摘数"他们以后。基本上,必须提取整个经文以及数字。
配方似乎是这样的:
verse = re.compile(r'([\t+]?{\d+}[^{]*)', re.DOTALL)
在上下文中:
import re
f = open('thebook.txt', 'r').read()
chapters = f.split('[BREAK]')
verse = re.compile(r'([\t+]?{\d+}[^{]*)', re.DOTALL)
verses = re.findall(verse, chapters[1])
请注意,它似乎工作正常,但我必须检查结果,以确保它解决所有问题。