有没有办法弄清楚文本/降价文件上文本的诗歌风格?

时间:2019-07-20 03:29:12

标签: javascript regex markdown

有一个markdown /文本文件,在普通段落之间有诗句,例如:

Mr. Lewes reaches this conclusion: "If, therefore, we reflect what a poem _Faust_ is, and that it contains almost every variety of style and metre, it will be tolerably evident that no one unacquainted with the original can form an adequate idea of it from translation,"  which is certainly correct of any translation wherein something of the rhythmical variety and beauty of the original is not retained. That very much of the rhythmical character may be retained in English, was long ago shown by Mr. Carlyle,  in the passages which he translated, both literally and rhythmically, from the _Helena_ (Part Second). In fact, we have so many instances of the possibility of reciprocally transferring the finest qualities of English and German poetry, that there is no sufficient excuse for an unmetrical translation of _Faust_. I refer especially to such subtile and melodious lyrics as "The Castle by the Sea," of Uhland, and the "Silent Land" of Salis, translated by Mr. Longfellow; Goethe's "Minstrel" and "Coptic Song," by Dr. Hedge; Heine's "Two Grenadiers," by Dr. Furness and many of Heine's songs by Mr Leland; and also to the German translations of English lyrics, by Freiligrath and Strodtmann. 

> Life of Goethe (Book VI.).

> Mr. Lewes gives the following advice: "The English reader would perhaps best succeed who should first read Dr. Anster's brilliant paraphrase, and then carefully go through Hayward's prose translation." This is singularly at variance with the view he has just expressed. Dr. Anster's version is an almost incredible dilution of the original, written in _other_ metres; while Hayward's entirely omits the element of poetry.

> Foreign Review, 1828.

> When Freiligrath can thus give us Walter Scott:—


"Kommt, wie der Wind kommt,  
Wenn Wälder erzittern  
Kommt, wie die Brandung  
Wenn Flotten zersplittern!  
Schnell heran, schnell herab,  
Schneller kommt Al'e!—  
Häuptling und Bub' und Knapp,  
Herr und Vasalle!"  

我想在markdown的所有这些诗歌结构的上方和下方放置三个`,以便HTML的输出在<pre>标签内包含诗歌。像这样:

```
"Kommt, wie der Wind kommt,  
Wenn Wälder erzittern  
Kommt, wie die Brandung  
Wenn Flotten zersplittern!  
Schnell heran, schnell herab,  
Schneller kommt Al'e!—  
Häuptling und Bub' und Knapp,  
Herr und Vasalle!"  
```

我想知道是否有一个正则表达式可以可靠地匹配文本文件上的这种模式?


另一种情况:

我有时会采用以下格式来识别诗歌散文:

abra ca dabra op
  lorem ipsum holy this 
line doesn't have 
  an indent but it's in
continuity of the 
  structure that sits together.

2 个答案:

答案 0 :(得分:1)

在大多数情况下,我会同意这样的评论,即AI可能可以最好地解决此问题。话虽这么说,您的头脑中已经有了一个AI(减去A),并且如果您可以查看格式良好的文档并定义组成“诗歌”的模式,则可以编写一个正则表达式来选择它。 / p>


^((?:.+\n)+.+)$此正则表达式选择内容行之间只有一个换行符的行。只要您的markdown /文本文件在普通段落之间至少放置两个换行符,并且这些诗的长度超过一行(例如您的示例中的内容),它就会捕获它们。

Try it here!


^((?:.+\n .+\n)+)$如果您希望对“诗歌”进行更严格的定义,则此正则表达式将查找一行内容,然后在其下方紧缩一行内容(也与您的示例类似)。正则表达式将不匹配不以缩进线结尾的“格式错误”的诗。

Try it here!

答案 1 :(得分:0)

我认为这里存在一个无法由AI解决的潜在问题,因为由于标记本身的定义,很难在降价中修复换行符。

在Multimarkdown中,允许在行末使用双空格表示换行符,而不是段落结尾。

在Commonmark规范中,它明确表示换行符,回车符或两者均相等,均表示段落的结尾。 (当然,这是为了容纳传统上从(旧)Mac,Unix或Windows系统标记的文本,因此它们都可以正确导入。但是,这具有使换行符和结尾参数变为不可能的副作用。)

我看到的唯一途径是通过预格式化的文本,但是通常将字体更改为等宽字体,这对于经文来说是不需要的。

在LaTeX世界中,诗歌世界同样如此复杂,在LaTeX世界中,至少有三种解决方案取决于您使用的包(诗歌,诗歌,诗歌,诗歌)。

因此,我认为规范需要扩展:使用预格式化块的选项或其他机制。