Question

我想在字符串下面拆分以使用正则表达式单独获取每个语句。

输入字符串：

str1 = "1. Write down what you eat for one week and you will lose weight.  2. Add 10 percent to the amount of daily calories you think you're eating.  3. Get an online weight loss buddy to lose more weight.  4. Get a mantra.  5. Eat three fewer bites of your meal, one less treat a day, or one less glass of orange juice. More items"

我的尝试：

re.split(r'[A-z]\.',str1)

输出：

['1. Write down what you eat for one week and you will lose weigh',  "
  2. Add 10 percent to the amount of daily calories you think you're eatin",  
  '  3. Get an online weight loss buddy to lose more weigh',  '
  4. Get a mantr',  '  5. Eat three fewer bites of your meal, one less treat a day, or one less glass of orange juic',  ' More items']

在输出中，我缺少每个陈述的最后一个字母。我想要输出如下：

[＆＃39; 1。记下你吃了一个星期的东西，你会减肥 t ＆＃39; ＆＃34; 2.将您认为自己的每日卡路里摄入量加10％吃东西 g ＆＃34;，＆＃39; 3.让一个在线减肥伙伴失去更多的重量 t ＆＃39; ＆＃39; 4.获得一个人员一个＆＃39;，＆＃39;吃三餐少吃一餐少吃一天，或少喝一杯橙汁 e ＆＃39;，＆＃39;更多项目＆＃39;]

Answer 1

之所以这样，是因为您正在使用最后2个字符，即您要拆分的相同字符。如果您不介意丢失.，那么您可以使用lookbehind来保留最后一个字母：

re.split(r'(?<=[a-z])\.',str1)

此外，请注意[A-z]并不代表所有字母，该字符范围内包含的字符数不是字母。

Answer 2

使用正面的lookbehind，以便不消耗前面的字符：

re.split(r'(?<=[A-Za-z])\.',str1)

请参阅https://docs.python.org/2/library/re.html：

（？＆lt; = ...）匹配如果字符串中的当前位置在前面 ...的匹配结束于当前位置。

如何使用正则表达式拆分字符串

2 个答案: