解析替换引号

时间:2016-10-30 11:19:51

标签: python regex parsing nlp quotes

我正在尝试解析一个文本文件,以便在python中对它进行一些统计。为此,我想用标记替换一些标点符号。这种令牌的一个例子是终止句子的所有标点符号(.!?成为<EndS>)。我设法使用正则表达式做到这一点。现在我正在尝试解析引号。因此,我认为,我需要一种方法来区分开盘报价和收盘价。我正在逐行读取输入文件,我无法保证报价将是平衡的。

例如:

 "Death to the traitors!" cried the exasperated burghers.
 "Go along with you," growled the officer, "you always cry the same thing over again. It is very tiresome."

应该成为:

 [Open] Death to the traitors! [Close] cried the exasperated burghers.
 [Open] Go along with you, [Close] growled the officer, [Open] you always cry the same thing over again. It is very tiresome. [Close]

是否可以使用正则表达式执行此操作?是否有更容易/更好的方法来做到这一点?

1 个答案:

答案 0 :(得分:5)

您可以使用 sub 方法(模块重新):

user.ParentChilds.ForEach(pc => pc.Child = db.Childs.FirstOrDefault(x => x.Id == pc.ChildId));

https://docs.python.org/3.5/library/re.html#re.sub