使用spaCy进行段落分割

时间:2020-03-23 20:48:31

标签: python-3.x nlp spacy

我有一个类似这样的文本文件“ scrap_txt”

I take the US News College rankings with a big grain of salt. I find some aspects of the ranking useful. For example, the Business and Engineering rankings are fairly accurate. The Peer Assessment rating is also telling.\nn\But other elements of the ranking either make little to no sense, or are easily “gamed” by universities. How is it possible to use one formula to determine the financial viability of a large public university and a small private university? How do alumni donations equal student satisfaction? How are universities measuring class sizes?\n\nFor example, how is UCLA’s Financial Resources rank #20 while Michigan’s is #40 when Michigan’s endowment is 250% larger than UCLA’s ($12 billion vs $5 billion), revenues generated from tuition at Michigan significantly exceed those generated at UCLA, institutional budget at Michigan exceeds UCLA’s, and the cost of operations at Michigan significantly lower than at UCLA?\n\n

一个段落可以说是在句子的开头和结尾用\ n \ n包裹的那些行。有人可以帮我解决这个问题的代码吗?结果的两段应该看起来像这样

paragraph1 = But other elements of the ranking either make little to no sense, or are easily “gamed” by universities. How is it possible to use one formula to determine the financial viability of a large public university and a small private university? How do alumni donations equal student satisfaction? How are universities measuring class sizes?

paragraph2 = For example, how is UCLA’s Financial Resources rank #20 while Michigan’s is #40 when Michigan’s endowment is 250% larger than UCLA’s ($12 billion vs $5 billion), revenues generated from tuition at Michigan significantly exceed those generated at UCLA, institutional budget at Michigan exceeds UCLA’s, and the cost of operations at Michigan significantly lower than at UCLA?

0 个答案:

没有答案