Question

我的机智已接近这个问题：基本上，我需要删除单词之间的双倍空格。我的程序恰好是希伯来语，但这是基本的想法：

TITLE: הלכות ‏ ‏השכמת‏ ‏הבוקר‏

注意前两个单词之间有一个额外的空格（Herbew从右到左阅读）。

我尝试了很多很多不同的方法，这里有几个：

# tried all these with and without unicode
title = re.sub(u'\s+',u' ',title.decode('utf-8'))
title = title.replace("  "," ")
title = title.replace(u"  הלכות",u" הלכות")

直到最后我才采取了一种非常不必要的方法（粘贴时有些格式化了）：

def remove_blanks(s):
    word_list = s.split(" ")
    final_word_list = []
    for word in word_list:
        print "word: " +word
        #tried every qualifier I could think of...
        if not_blank(word) and word!=" " and True != re.match("s*",word):
            print "^NOT BLANK^"
            final_word_list.append(word)
    return ' '.join(final_word_list)

def not_blank(s):
    while " " in s:
        s = s.replace(" ","")
    return (len(s.replace("\n","").replace("\r","").replace("\t",""))!=0);

而且，令我惊讶的是，这就是我的回忆：

word: הלכות
^NOT BLANK^
word: ‏           #this should be tagged as Blank!!
^NOT BLANK^
word: ‏השכמת‏
^NOT BLANK^
word: ‏הבוקר‏
^NOT BLANK^

显然，我的预选赛并没有奏效。这是怎么回事？

Answer 1

有一个隐藏的\ xe2 \ x80 \ x8e，LEFT-TO-RIGHT MARK。使用repr（word）找到它。谢谢@mgilson！

Python没有识别空格字符

1 个答案: