我正在处理一个数据集,在该数据集上我需要将某些值四舍五入为上下限。
例如如果我想将上限设置为 9 并将其设置为 3 ,我们有--
[ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ]
,我们希望将列表四舍五入为3或9-
[ 9,
9,
9,
3,
3 ]
我知道我们可以采用一种很好的旧方法来做到这一点,例如在数组中进行迭代并找到差异,然后得到最接近的那个。
我的方法代码:
for i in the_list[:]:
three = abs(3-the_list[i])
nine = abs(9-the_list[i])
if three < nine:
the_list[i] = three
else:
the_list[i] = nine
我想知道是否有快速而肮脏的方式内置在python中,例如:
hey_bound = round_the_num(number, bound_1, bound_2)
我知道我们可以my-approach-code
,但是我非常确定在那里已经实现了更好的方法,我试图找到它,但是没有运气,就在这里。< / p>
解决方案的任何猜测或直接链接都将是惊人的。
答案 0 :(得分:3)
编辑:
到目前为止,我认为最好的方法是使用numpy(以避免“手动”循环),对the_list
和两个边界之间的差值数组进行简单计算(因此此处无需昂贵的乘法运算),然后仅在有条件的情况下使用添加一个或另一个,取决于哪个较小:
import numpy as np
the_list = np.array([ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ])
dhi = 9 - the_list
dlo = 3 - the_list
idx = dhi + dlo < 0
the_rounded = the_list + np.where(idx, dhi, dlo)
# array([9., 9., 9., 3., 3.])
我将对无偏移量的标准化列表应用舍入函数,然后向下缩放并添加偏移量:
import numpy as np
the_list = np.array([ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ])
hi = 9
lo = 3
dlt = hi - lo
the_rounded = np.round((the_list - lo)/dlt) * dlt + lo
# [9. 9. 9. 3. 3.]
答案 1 :(得分:3)
时间比较 可用答案
我的解释是:
从性能的角度来看,您应该选择Abhishek Patel或Carles Mitjans以获得较小的列表。
对于包含数十个值和更多值的列表,使用numpy数组然后有条件地添加具有较小绝对值的差异似乎是最快的解决方案。
用于时间比较的代码:
import timeit
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
rep = 5
timings = dict()
for n in range(7):
print(f'N = 10^{n}')
N = 10**n
setup = f'''import numpy as np\nthe_list = np.random.random({N})*6+3\nhi = 9\nlo = 3\ndlt = hi - lo\nmid = (hi + lo) / 2\ndef return_the_num(l, lst, h):\n return [l if abs(l-x) < abs(h-x) else h for x in lst]'''
fct = 'np.round((the_list - lo)/dlt) * dlt + lo'
t = timeit.Timer(fct, setup=setup)
timings['SpghttCd_np'] = timings.get('SpghttCd_np', []) + [np.min(t.repeat(repeat=rep, number=1))]
fct = 'return_the_num(3, the_list, 9)'
t = timeit.Timer(fct, setup=setup)
timings['Austin'] = timings.get('Austin', []) + [np.min(t.repeat(repeat=rep, number=1))]
fct = '[(lo, hi)[mid < v] for v in the_list]'
t = timeit.Timer(fct, setup=setup)
timings['SpghttCd_lc'] = timings.get('SpghttCd_lc', []) + [np.min(t.repeat(repeat=rep, number=1))]
setup += '\nround_the_num = lambda list, upper, lower: [upper if x > (upper + lower) / 2 else lower for x in list]'
fct = 'round_the_num(the_list, 9, 3)'
t = timeit.Timer(fct, setup=setup)
timings['Carles Mitjans'] = timings.get('Carles Mitjans', []) + [np.min(t.repeat(repeat=rep, number=1))]
setup += '\nupper_lower_bound_list=[3,9]'
fct = '[min(upper_lower_bound_list, key=lambda x:abs(x-myNumber)) for myNumber in the_list]'
t = timeit.Timer(fct, setup=setup)
timings['mad_'] = timings.get('mad_', []) + [np.min(t.repeat(repeat=rep, number=1))]
setup += '\ndef return_bound(x, l, h):\n low = abs(x - l)\n high = abs(x - h)\n if low < high:\n return l\n else:\n return h'
fct = '[return_bound(x, 3, 9) for x in the_list]'
t = timeit.Timer(fct, setup=setup)
timings["Scratch'N'Purr"] = timings.get("Scratch'N'Purr", []) + [np.min(t.repeat(repeat=rep, number=1))]
setup += '\ndef round_the_list(list, bound_1, bound_2):\n\tmid = (bound_1+bound_2)/2\n\tfor i in range(len(list)):\n\t\tif list[i] > mid:\n\t\t\tlist[i] = bound_2\n\t\telse:\n\t\t\tlist[i] = bound_1'
fct = 'round_the_list(the_list, 3, 9)'
t = timeit.Timer(fct, setup=setup)
timings["Abhishek Patel"] = timings.get("Abhishek Patel", []) + [np.min(t.repeat(repeat=rep, number=1))]
fct = 'dhi = 9 - the_list\ndlo = 3 - the_list\nidx = dhi + dlo < 0\nthe_list + np.where(idx, dhi, dlo)'
t = timeit.Timer(fct, setup=setup)
timings["SpghttCd_where"] = timings.get("SpghttCd_where", []) + [np.min(t.repeat(repeat=rep, number=1))]
print('done')
df = pd.DataFrame(timings, 10**np.arange(n+1))
ax = df.plot(logx=True, logy=True)
ax.set_xlabel('length of the list')
ax.set_ylabel('seconds to run')
ax.get_lines()[-1].set_c('g')
plt.legend()
print(df)
答案 2 :(得分:1)
您可以通过找到中点并检查列表中每个数字在中点的哪一侧来进行概括
def round_the_list(list, bound_1, bound_2):
mid = (bound_1+bound_2)/2
for i in range(len(list)):
if list[i] > mid: # or >= depending on your rounding decision
list[i] = bound_2
else:
list[i] = bound_1
答案 3 :(得分:1)
也许您可以编写一个函数并将其用于列表理解中。
def return_bound(x, l, h):
low = abs(x - l)
high = abs(x - h)
if low < high:
return l
else:
return h
测试:
>>> mylist = [7.453511737983394, 8.10917072790058, 6.2377799380575, 5.225853201122676, 4.067932296134156]
>>> [return_bound(x, 3, 9) for x in mylist]
[9, 9, 9, 3, 3]
答案 4 :(得分:1)
使用内建min
函数通过修改key参数以寻找绝对差异来进行线性列表理解
upper_lower_bound_list=[3,9]
myNumberlist=[ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ]
列表理解
[min(upper_lower_bound_list, key=lambda x:abs(x-myNumber)) for myNumber in myNumberlist]
输出
[9, 9, 9, 3, 3]
答案 5 :(得分:1)
使用列表推导和lambda函数的另一个选项:
round_the_num = lambda list, upper, lower: [upper if x > (upper + lower) / 2 else lower for x in list]
round_the_num(l, 9, 3)
答案 6 :(得分:1)
您可以编写一个进行列表理解的自定义函数,例如:
lst = [ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ]
def return_the_num(l, lst, h):
return [l if abs(l-x) < abs(h-x) else h for x in lst]
print(return_the_num(3, lst, 9))
# [9, 9, 9, 3, 3]
答案 7 :(得分:1)
我真的很喜欢@AbhishekPatel关于与中点进行比较的想法。但是我将其作为结果作为边界元组的索引放入了LC中:
the_list = [ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ]
hi = 9
lo = 3
mid = (hi + lo) / 2
[(lo, hi)[mid < v] for v in the_list]
# [9, 9, 9, 3, 3]
...但是比numpy方法慢15倍。
但是,这里可以处理大于hi
或小于lo
的数字。
...但这仅适用于100000个条目列表。如果是OP发布的原始列表,则这两个变体非常接近...