Question

我想查找一个单词是否包含数字和字符，如果是，则将数字部分和字符部分分开。我想检查泰米尔语单词，例如：ரூ.100或ரூ100。我想单独ரூ.和100，以及ரூ和100。我怎么在python中做到这一点。我试过这样：

    for word in f.read().strip().split(): 
      for word1, word2, word3 in zip(word,word[1:],word[2:]): 
        if word1 == "ர" and word2 == "ூ " and word3.isdigit(): 
           print word1 
           print word2 
        if word1.decode('utf-8') == unichr(0xbb0) and word2.decode('utf-8') == unichr(0xbc2): 
           print word1 print word2

Answer 1

您可以使用(.*?)(\d+)(.*)正则表达式，这将保存3组：数字，数字和所有内容之后的所有内容：

>>> import re
>>> pattern = ur'(.*?)(\d+)(.*)'
>>> s = u"ரூ.100"
>>> match = re.match(pattern, s, re.UNICODE)
>>> print match.group(1)
ரூ.
>>> print match.group(2)
100

或者，您可以将匹配的组解压缩到变量中，如下所示：

>>> s = u"100ஆம்"
>>> match = re.match(pattern, s, re.UNICODE)
>>> before, digits, after = match.groups()
>>> print before

>>> print digits
100
>>> print after
ஆம்

希望有所帮助。

Answer 2

使用unicode属性：

\pL代表任何语言的字母
\pN代表任何语言的数字。

在你的情况下，它可能是：

(\pL+\.?)(\pN+)

识别字符是否是python中单词中的数字或Unicode字符

2 个答案: