通过保留分隔符来拆分文本

时间:2018-01-10 07:47:39

标签: python regex python-3.6

假设一个文本

content = '\n\nPART I, WHERE I’M COMING FROM\n\n1\xa0My Call to Adventure: 1949–1967\n2\xa0Crossing the Threshold: 1967–1979\n3\xa0My Abyss: 1979–1982\n4\xa0My Road of Trials: 1983–1994\n5\xa0The Ultimate Boon: 1995–2010\n6\xa0Returning the Boon: 2011–2015\n7\xa0My Last Year and My Greatest Challenge: 2016–2017\n8\xa0Looking Back from a Higher Level\n\nPART II, LIFE PRINCIPLES\n\n1\xa0Embrace Reality and Deal with It\n2\xa0Use the 5-Step Process to Get What You Want Out of Life\n3\xa0Be Radically Open-Minded\n4\xa0Understand That People Are Wired Very Differently\n5\xa0Learn How to Make Decisions Effectively\nLife Principles: Putting It All Together\nSummary and Table of Life Principles\n\nPART III, WORK PRINCIPLES\n\nSummary and Table of Work Principles\nTO GET THE CULTURE RIGHT\xa0.\xa0.\xa0.\n1\xa0Trust in Radical Truth and Radical Transparency\n2\xa0Cultivate Meaningful Work and Meaningful Relationships\n3\xa0Create a Culture in Which It Is Okay to Make Mistakes and Unacceptable Not to Learn from Them\n4\xa0Get and Stay in Sync\n5\xa0Believability Weight Your Decision Making\n6\xa0Recognize How to Get Beyond Disagreements\n\nTO GET THE PEOPLE RIGHT\xa0.\xa0.\xa0.\n7\xa0Remember That the WHO Is More Important than the WHAT\n8\xa0Hire Right, Because the Penalties for Hiring Wrong Are Huge\n9\xa0Constantly Train, Test, Evaluate, and Sort People\n\nTO BUILD AND EVOLVE YOUR MACHINE\xa0.\xa0.\xa0.\n10\xa0Manage as Someone Operating a Machine to Achieve a Goal\n11\xa0Perceive and Don’t Tolerate Problems\n12\xa0Diagnose Problems to Get at Their Root Causes\n13\xa0Design Improvements to Your Machine to Get Around Your Problems\n14\xa0Do What You Set Out to Do\n15\xa0Use Tools and Protocols to Shape How Work Is Done\n16\xa0And for Heaven’s Sake, Don’t Overlook Governance!\nWork Principles: Putting It All Together\n\nACKNOWLEDGMENTS\nABOUT THE AUTHOR\nCONCLUSION\n\nAPPENDIX: TOOLS AND PROTOCOLS FOR BRIDGEWATER’S IDEA MERITOCRACY\nBIBLIOGRAPHY\nINDEX'

我将文字拆分为PART

In [11]: re.split(r'\n\n(?=PART)', content)
Out[11]:
['PART I, WHERE I’M COMING FROM\n\n1\xa0My Call to Adventure: 1949–1967\n2\xa0Crossing the Threshold: 1967–1979\n3\xa0My Abyss: 1979–1982\n4\xa0My Road of Trials: 1983–1994\n5\xa0The Ultimate Boon: 1995–2010\n6\xa0Returning the Boon: 2011–2015\n7\xa0My Last Year and My Greatest Challenge: 2016–2017\n8\xa0Looking Back from a Higher Level',
'PART II, LIFE PRINCIPLES\n\n1\xa0Embrace Reality and Deal with It\n2\xa0Use the 5-Step Process to Get What You Want Out of Life\n3\xa0Be Radically Open-Minded\n4\xa0Understand That People Are Wired Very Differently\n5\xa0Learn How to Make Decisions Effectively\nLife Principles: Putting It All Together\nSummary and Table of Life Principles',
'PART III, WORK PRINCIPLES\n\nSummary and Table of Work Principles\nTO GET THE CULTURE RIGHT\xa0.\xa0.\xa0.\n1\xa0Trust in Radical Truth and Radical Transparency\n2\xa0Cultivate Meaningful Work and Meaningful Relationships\n3\xa0Create a Culture in Which It Is Okay to Make Mistakes and Unacceptable Not to Learn from Them\n4\xa0Get and Stay in Sync\n5\xa0Believability Weight Your Decision Making\n6\xa0Recognize How to Get Beyond Disagreements\n\nTO GET THE PEOPLE RIGHT\xa0.\xa0.\xa0.\n7\xa0Remember That the WHO Is More Important than the WHAT\n8\xa0Hire Right, Because the Penalties for Hiring Wrong Are Huge\n9\xa0Constantly Train, Test, Evaluate, and Sort People\n\nTO BUILD AND EVOLVE YOUR MACHINE\xa0.\xa0.\xa0.\n10\xa0Manage as Someone Operating a Machine to Achieve a Goal\n11\xa0Perceive and Don’t Tolerate Problems\n12\xa0Diagnose Problems to Get at Their Root Causes\n13\xa0Design Improvements to Your Machine to Get Around Your Problems\n14\xa0Do What You Set Out to Do\n15\xa0Use Tools and Protocols to Shape How Work Is Done\n16\xa0And for Heaven’s Sake, Don’t Overlook Governance!\nWork Principles: Putting It All Together\n\nACKNOWLEDGMENTS\nABOUT THE AUTHOR\nCONCLUSION\n\nAPPENDIX: TOOLS AND PROTOCOLS FOR BRIDGEWATER’S IDEA MERITOCRACY\nBIBLIOGRAPHY\nINDEX']

另一种解决方案:

#Contrast with re.split(r'\n\n(?=PART)', content)
#put one \n inside capturing group
In [14]: re.split(r'\n(?=\nPART)', content)
Out[14]:
['',
 '\nPART I, WHERE I’M COMING FROM\n\n1\xa0My Call to Adventure: 1949–1967\n2\xa0Crossing the Threshold: 1967–1979\n3\xa0My Abyss: 1979–1982\n4\xa0My Road of Trials: 1983–1994\n5\xa0The Ultimate Boon: 1995–2010\n6\xa0Returning the Boon: 2011–2015\n7\xa0My Last Year and My Greatest Challenge: 2016–2017\n8\xa0Looking Back from a Higher Level',
 '\nPART II, LIFE PRINCIPLES\n\n1\xa0Embrace Reality and Deal with It\n2\xa0Use the 5-Step Process to Get What You Want Out of Life\n3\xa0Be Radically Open-Minded\n4\xa0Understand That People Are Wired Very Differently\n5\xa0Learn How to Make Decisions Effectively\nLife Principles: Putting It All Together\nSummary and Table of Life Principles',
 '\nPART III, WORK PRINCIPLES\n\nSummary and Table of Work Principles\nTO GET THE CULTURE RIGHT\xa0.\xa0.\xa0.\n1\xa0Trust in Radical Truth and Radical Transparency\n2\xa0Cultivate Meaningful Work and Meaningful Relationships\n3\xa0Create a Culture in Which It Is Okay to Make Mistakes and Unacceptable Not to Learn from Them\n4\xa0Get and Stay in Sync\n5\xa0Believability Weight Your Decision Making\n6\xa0Recognize How to Get Beyond Disagreements\n\nTO GET THE PEOPLE RIGHT\xa0.\xa0.\xa0.\n7\xa0Remember That the WHO Is More Important than the WHAT\n8\xa0Hire Right, Because the Penalties for Hiring Wrong Are Huge\n9\xa0Constantly Train, Test, Evaluate, and Sort People\n\nTO BUILD AND EVOLVE YOUR MACHINE\xa0.\xa0.\xa0.\n10\xa0Manage as Someone Operating a Machine to Achieve a Goal\n11\xa0Perceive and Don’t Tolerate Problems\n12\xa0Diagnose Problems to Get at Their Root Causes\n13\xa0Design Improvements to Your Machine to Get Around Your Problems\n14\xa0Do What You Set Out to Do\n15\xa0Use Tools and Protocols to Shape How Work Is Done\n16\xa0And for Heaven’s Sake, Don’t Overlook Governance!\nWork Principles: Putting It All Together\n\nACKNOWLEDGMENTS\nABOUT THE AUTHOR\nCONCLUSION\n\nAPPENDIX: TOOLS AND PROTOCOLS FOR BRIDGEWATER’S IDEA MERITOCRACY\nBIBLIOGRAPHY\nINDEX']

然而,当我测试时,

In [15]: re.split(r'(?=\n\nPART)', content)
# it report error
ValueError: split() requires a non-empty pattern match.

我无法理解问题是什么?

0 个答案:

没有答案