Question

根据我之前的问题How do I convert unicode characters to floats in Python?，我想找到一个更优雅的解决方案来计算包含unicode数值的字符串的值。

例如，取字符串“1⅕”和“1⅕”。我希望这些解决到1.2

我知道我可以逐个字符地迭代，检查每个字符的unicodedata.category（x）==“No”，并通过unicodedata.numeric（x）转换unicode字符。然后我必须拆分字符串并对值求和。然而，这似乎相当黑客和不稳定。在Python中有更优雅的解决方案吗？

Answer 1

我认为这就是你想要的......

import unicodedata
def eval_unicode(s):
    #sum all the unicode fractions
    u = sum(map(unicodedata.numeric, filter(lambda x: unicodedata.category(x)=="No",s)))
    #eval the regular digits (with optional dot) as a float, or default to 0
    n = float("".join(filter(lambda x:x.isdigit() or x==".", s)) or 0)
    return n+u

或“综合”解决方案，适合那些喜欢这种风格的人：

import unicodedata
def eval_unicode(s):
    #sum all the unicode fractions
    u = sum(unicodedata.numeric(i) for i in s if unicodedata.category(i)=="No")
    #eval the regular digits (with optional dot) as a float, or default to 0
    n = float("".join(i for i in s if i.isdigit() or i==".") or 0)
    return n+u

但要注意，有许多unicode值似乎没有在python中分配数值（例如⅜⅝不起作用......或者可能仅仅是我的键盘xD）。

关于实施的另一个注意事项：它“过于强大”，即使是“123½3½”之类的错误数字也会起作用，并将其评估为1234.0 ......但如果有多个点，它将无效。

Answer 2

>>> import unicodedata
>>> b = '10 ⅕'
>>> int(b[:-1]) + unicodedata.numeric(b[-1])
10.2

define convert_dubious_strings(s):
    try:
        return int(s)
    except UnicodeEncodeError:
        return int(b[:-1]) + unicodedata.numeric(b[-1])

如果它可能没有整数部分，则需要添加另一个try-except子块。

Answer 3

这对您来说已经足够了，具体取决于您要处理的奇怪边缘情况：

val = 0
for c in my_unicode_string:
    if unicodedata.category(unichr(c)) == 'No':
        cval = unicodedata.numeric(c)
    elif c.isdigit():
        cval = int(c)
    else:
        continue
    if cval == int(cval):
        val *= 10
    val += cval
print val

假设整数位是数字中的另一个数字，小数字符被假定为要添加到数字的分数。在数字，重复分数等之间用空格做正确的事情。

Answer 4

我认为你需要一个正则表达式，明确列出你想要支持的字符。并非所有数字字符都适合您想象的那种组合 - 例如，

的数值应该是什么

u"4\N{CIRCLED NUMBER FORTY TWO}2\N{SUPERSCRIPT SIX}"

???

做

for i in range(65536):
  if unicodedata.category(unichr(i)) == 'No':
      print hex(i), unicodedata.name(unichdr(i))

并查看列出您想要支持哪些内容的列表。

如何在python中使用unicode组件计算字符串的数值？

4 个答案: