str.split在页码上

时间:2017-09-07 07:14:37

标签: python regex string

我正在对书目数据进行一些简单的文本提取,并且有一个像这样的字符串:

texts = '36 L. Ronse De Craene / Flora 221 (2016) 22–37Chen, L., Ren, Y., Endress, P.K., Tian, X.H., Zhang, X.H., 2007. Floral organogenesis inTetracentron sinense (Trochodendraceae) and its systematic significance. PlantSyst. Evol. 264, 183–193.Choob, V.V., Yurtseva, O.V., 2007. Mathematical model of flower formation in thePolygonaceae members. Bot. Zh. 92, 114–134.Clark, S.E., Running, M.P., Meyerowitz, E.M., 1993. CLAVATA1, a regulator ofmeristem and flower development in Arabidopsis. Development 119, 397–418.Clark, S.E., Running, M.P., Meyerowitz, E.M., 1995. CLAVATA3 is a specific regulatorof shoot and floral meristem development affecting the same processes asCLAVATA1. Development 121, 2057–2067.Costello, A., Motley, T.J., 2004. The development of the superior ovary inTetraplasandra (Araliaceae). Am. J. Bot. 91, 644–655.Davidson, C., 1973. An anatomical and morphological study of Datiscaceae. Aliso 8,49–110.Dickison, W.C., 1990a. A study of the floral morphology and anatomy of theCaryocaraceae. Bull. Torrey Bot. Club 117, 123–137'

我想在页码中对这个字符串进行子集化,每个条目的末尾都以xxx-xxx的形式出现,其中x是一个数字,所以我觉得这样的东西应该有效:

re.split(r'\d+\-\d+', texts)

我尝试了一些这方面的变种但是没有成功。我经常不使用正则表达式,我认为我错过了一些小的东西。

输出I' m瞄准:

['36 L. Ronse De Craene / Flora 221 (2016)',

'Chen, L., Ren, Y., Endress, P.K., Tian, X.H., Zhang, X.H., 2007. Floral organogenesis inTetracentron sinense (Trochodendraceae) and its systematic significance. PlantSyst. Evol. 264,',

'.Choob, V.V., Yurtseva, O.V., 2007. Mathematical model of flower formation in thePolygonaceae members. Bot. Zh. 92,',  

...] 

1 个答案:

答案 0 :(得分:0)

您的文本字符串与正则表达式中的字符串不同:

当你把一个写在另一个上面时,你可以看到它:

-

-