如何使用Python`tring.find()`找到段落的边界?

时间:2016-01-13 00:54:00

标签: python regex string

我有一个文本语料库,用\nn分隔成段落。

\n\n"Well done, Mrs. Martin!" thought Emma.  "You know what you are about."\n\n"And when she had come away, Mrs. Martin was so very kind as to send\nMrs. Goddard a beautiful goose--the finest goose Mrs. Goddard had\never seen.  Mrs. Goddard had dressed it on a Sunday, and asked all\nthe three teachers, Miss Nash, and Miss Prince, and Miss Richardson,\nto sup with her."\n\n"Mr. Martin, I suppose, is not a man of information beyond the line\nof his own business? He does not read?"\n\n"Oh yes!--that is, no--I do not know--but I believe he has\nread a good deal--but not what you would think any thing of.\nHe reads the Agricultural Reports, and some other books that lay\nin one of the window seats--but he reads all _them_ to himself.\nBut sometimes of an evening, before we went to cards, he would read\nsomething aloud out of the Elegant Extracts, very entertaining.\nAnd I know he has read the Vicar of Wakefield.  He never read the\nRomance of the Forest, nor The Children of the Abbey.  He had never\nheard of such books before I mentioned them, but he is determined\nto get them now as soon as ever he can."\n\nThe next question was--\n\n"What sort of looking man is Mr. Martin?"

或者如果打印,

"Well done, Mrs. Martin!" thought Emma.  "You know what you are about."

"And when she had come away, Mrs. Martin was so very kind as to send
Mrs. Goddard a beautiful goose--the finest goose Mrs. Goddard had
ever seen.  Mrs. Goddard had dressed it on a Sunday, and asked all
the three teachers, Miss Nash, and Miss Prince, and Miss Richardson,
to sup with her."

"Mr. Martin, I suppose, is not a man of information beyond the line
of his own business? He does not read?"

"Oh yes!--that is, no--I do not know--but I believe he has
read a good deal--but not what you would think any thing of.
He reads the Agricultural Reports, and some other books that lay
in one of the window seats--but he reads all _them_ to himself.
But sometimes of an evening, before we went to cards, he would read
something aloud out of the Elegant Extracts, very entertaining.
And I know he has read the Vicar of Wakefield.  He never read the
Romance of the Forest, nor The Children of the Abbey.  He had never
heard of such books before I mentioned them, but he is determined
to get them now as soon as ever he can."

The next question was--

"What sort of looking man is Mr. Martin?"

鉴于某个段落,我想知道段落的边界在哪里。也就是说,我想通过换行符\n\n找到段落的位置。

我的目标是让我的光标点击某个段落,我会根据\n\n的位置知道这个段落的界限。

import string
string.find("\n\n")

将输出空格在字符串中的位置。但某个段落怎么样?如果我"点击"在第四段(Vicar of Wakefield)上,如何搜索上面的第一个\n\n并搜索下面的第一个\n\n

1 个答案:

答案 0 :(得分:1)

假设您知道您点击的位置pos"在长文本字符串中,您可以使用str.findstr.rfind()来解决您的问题。

看"前进"你会做一个:

string.find("\n\n", pos)  # searches for "\n\n" starting from position `pos`, returning the first match

和"落后"你会做一个:

string.rfind("\n\n", 0, pos) # searches for "\n\n" from the beginning up-to `pos` but will return you the last match

有关这两种方法的文档,请查看https://docs.python.org/2/library/string.html