Question

我试图从一组歌词中得到对联的数量。让我们说歌词是：

I saw a little hermit crab
His coloring was oh so drab

It’s hard to see the butterfly
Because he flies across the sky

等等......

Once upon a time
She made a little rhyme
Of course, of course

Before we say again
The pain the pain
A horse, a horse

Lightening, thunder, all around
Soon the rain falls on the ground

I tire of writing poems and rhyme

它们作为字符串存储在数据库中，由u'\r\n'和string.splitlines（Tree）分隔，对象将它们存储为：

>>> lyrics[6].track_lyrics['lyrics']
[u'I saw a little hermit crab\r\n', u'His coloring was oh so drab\r\n', u'\r\n', u'It\u2019s hard to see the butterfly\r\n', u'Because he flies across the sky\r\n', u'\r\n',  u'\r\n', u'Before we say again\r\n', u'The pain the pain\r\n', u'A horse, a horse\r\n', u'\r\n', u'Lightening, thunder, all around\r\n', u'Soon the rain falls on the ground\r\n', u'\r\n', u'I tire of writing poems and rhyme\r\n']

我可以接近这个：

len([i for i in lyrics if i != "\r\n"]) / 2

但它也会将一行，三行或更多行作为对联计算。

我可以通过这种方式实现这一目标，基本上说如果之前有"\r\n"一行，之后有两行，我们就会联合起来：

>>> for k,v in enumerate(lyric_list):
...     if lyric_list[k+2] == "\r\n" and lyric_list[k-1] == "\r\n":
...             print(v)
... 
It’s hard to see the butterfly

Hear the honking of the goose


Lightening, thunder, all around

但是，当然：

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
IndexError: list index out of range

我可以使用try和except IndexError:这样的内容：

>>> if len(lyric_string) > 1:
...     for k, v in enumerate(lyric_string):
...             if k == 0 and lyric_string[k+2] == "\r\n":
...                     print(v)
...             elif lyric_string[k-1] == "\r\n" and lyric_string[k+2] == "\r\n":
...                     print(v)
... 
I saw a little hermit crab

It’s hard to see the butterfly

Hear the honking of the goose

His red sports car is just a dream

The children like the ocean shore

I made the cookies one by one

My cat, she likes to chase a mouse,

Lightening, thunder, all around

Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
IndexError: list index out of range

我考虑做过这样的事情，这更加丑陋，不起作用！（只获取第一行和最后一行）：

>>> if len(lyric_string) > 1:
...     for k, v in enumerate(lyric_string):
...             if k == 0 and lyric_string[k+2] == "\r\n":
...                     print(v)
...             elif lyric_string[k-1] == "\r\n" and (k+2 > len(lyric_string) \
...                                                     or lyric_string[k+2] == "\r\b"):
...                     print(v)

但我敢打赌，这是一种更有说服力甚至是蟒蛇的方法。

Answer 1

一种稍微简单的方法：用＆＃34;＆＃34;加入整个数组。并计算换行的出现次数。

>>> s = """Once upon a time
... She made a little rhyme
... Of course, of course
...
... Before we say again
... The pain the pain
... A horse, a horse
...
... Lightening, thunder, all around
... Soon the rain falls on the ground
...
... I tire of writing poems and rhyme"""

然后就这样做：

>>> s.strip().count("\n\n") + 1
4

要在上面的代码中获取s，您需要进行额外的加入。一个例子

s = "".join(lyrics[6].track_lyrics['lyrics'])

我在我的系统上使用\n，您可能必须使用\r\n。

Answer 2

我假设对联是一组包含2行的行。

您可以通过分割成块，然后计算每个块中的行数来实现此目的。在这个例子中，我计算一个块中的换行符数（一对中应为1）。

>>> text = """I saw a little hermit crab
... His coloring was oh so drab
... 
... It’s hard to see the butterfly
... Because he flies across the sky
... 
... etc etc...
... 
... Once upon a time
... She made a little rhyme
... Of course, of course
... 
... Before we say again
... The pain the pain
... A horse, a horse
... 
... Lightening, thunder, all around
... Soon the rain falls on the ground
... 
... I tire of writing poems and rhyme
... """.replace('\n', '\r\n')
>>> len([block for block in text.split('\r\n\r\n') if block.count('\r\n') == 1])
3

这也假设每个块之间只有两个换行符。要处理2个以上的换行符，您可以使用：

import re
...
.. block for block in re.split(r'(?:\r\n){2,}', text) ..

使用\ r \ n换行符从列表中计算对联

2 个答案: