Question

我使用以下正则表达式将以字符串形式传入的短语拆分为单词列表。

因为可能有其他字母，我使用UTF标志。这在大多数情况下都很有效：

angular.module('app').directive('widget', function() {
    return {
      restrict: 'AE',
      scope: {
        inView: '='
      },
      transclude: false,
      templateUrl: 'directives/widgets/widgets.tpl.html',
      link: function(scope) {
        console.log('In Viewport: ', scope.inView); // Null

但是，如果短语是以这样的句号结尾的句子，它将在列表中创建一个空白值：

phrase = 'hey look out'
word_list = re.split(r'[\W_]+', unicode(phrase, 'utf-8').lower(), flags=re.U)
word_list [u'hey', u'look', u'out']

我的工作就是使用 phrase = 'hey, my spacebar_is broken.' word_list [u'hey', u'my', u'spacebar', u'is', u'broken', u''] 但我想知道有没有办法在正则表达式中解决它？

Answer 1

Template.Foo.onCreated(function() { this.candidateUserId = new ReactiveVar(FlowRouter.getParam('id')); }); Template.Foo.helpers({ candidateImg() { return ProfessionalOverview.findOne({ userId: Template.instance().candidateUserId.get()}); } });选择非单词字符。由于\W是非单词字符，因此字符串将在其上拆分。由于句点之后没有任何内容，因此您将获得一个空字符串。如果你想避免这种情况，你需要去掉字符串末尾的分隔符字符

或过滤生成的数组以删除空字符串。

phrase = re.sub(r'^[\W_]+|[\W_]+$', '', phrase)

或者，你可以通过直接匹配它们而不是拆分来获得单词：

word_list = [word for word in word_list if word]

Python在拆分时重新添加空间

1 个答案: