Question

我有一个包含浮点字符串的列表：

test_number = ['8.3','10.0','1.0','8.7','6.9','4.7','8.7']

我想用两个类别替换这些浮点数：每个数字最多为5.9的“1”和6.0到10.0之间的每个数字的“2”。由于它是字符串，我尝试使用正则表达式来查找数字：

test5=[]
for r in test_number:
    if re.match("[0-5]?[.][0-9]",r): #for every number up till 5.9
        test5.append(1)
    if re.match("[6-9]?[.][0-9]",r): #for every number from 6.0 till 9.9
        test5.append(2)
    if re.match("[0-1]?[0-1]?[.][0-9]?",r): #for every 10 (now a 3 for more clear output)
        test5.append(3)

这返回了以下输出：

test5
[2, 3, 1, 3, 2, 2, 1, 2]

正如您所看到的，这不会返回我想要的内容。

我也尝试过使用pandas pd.cut：

df_test = pd.DataFrame(['8.3','10.0','1.0','8.7','6.9','4.7','8.7'])
df_test.columns=['rating']

bins = [0.1, 5.9, 10.0]
group_names = [1,2]
df_test['number'] = pd.cut(df_test['number'], bins, labels=group_names)

这只给了我一个。我该如何解决这个问题？

Answer 1

在您的情况下，问题是最后一个表达式if re.match("[0-1]?[0-1]?[.][0-9]?",r)与10.0和1.0匹配，而您没有使用elif（同样，正则表达式错误，它应该是re.match("10[.][0-9]?",r)）

在这种特殊情况下，您不需要正则表达式，正则表达式最适合字符串匹配。任何有数字计算的东西都会导致角落，如果这些数字是科学记数法会怎么样？

所以，只需转换为float并使用列表推导中的嵌套三元比较进行比较（请注意，使用您的规范，5.9 =＆gt; 6.0范围将产生3，这可能不是您想要的）：< / p>

test_number = ['8.3','10.0','1.0','8.7','6.9','4.7','8.7']


test5 = [1 if float(x)<=5.9 else 2 if 6.0 <= float(x) < 10.0 else 3 for x in test_number]

print(test5)

结果：

[2, 3, 1, 2, 2, 1, 2]

这里有点缺点：我计算float(x)两次。可以通过嵌套生成器理解或仅map到float：

来改进

test5 = [1 if x<=5.9 else 2 if 6.0 <= x < 10.0 else 3 for x in map(float,test_number)]

请注意，更合乎逻辑的解决方案是在第一种情况下包含5.9 =＆gt; 6.0范围：

test5 = [1 if x<6.0 else 2 if x < 10.0 else 3 for x in map(float,test_number)]

Answer 2

这是一个非常简单的命令式方法：

def mark_list_on_condition(sequence):
    results = []
    for item in sequence:
        number = float(item)
        if number < 6:
            results.append('1')
        elif 6 < number < 10:
            results.append('2')
        elif number == 10.0:
            results.append('3')
    return results

示例输出：

>>> print mark_list_on_condition(['8.3','10.0','1.0','8.7','6.9','4.7','8.7'])
['2', '3', '1', '2', '2', '1', '2']

用数字（python）替换字符串中的浮点数

2 个答案: