在阅读rust文档时,我偶然发现this code使用while循环(带有索引)来迭代数组a
:
fn main() {
let a = [10, 20, 30, 40, 50];
let mut index = 0;
while index < 5 {
println!("the value is: {}", a[index]);
index += 1;
}
}
文档说:
...这种方法容易出错;如果索引长度不正确,我们可能会导致程序崩溃。这也很慢,因为编译器会添加运行时代码来对循环中每次迭代的每个元素执行条件检查。
第一个原因是不言自明的。第二个原因是我感到困惑。
此外,他们建议为此使用for循环。
fn main() {
let a = [10, 20, 30, 40, 50];
for element in a.iter() {
println!("the value is: {}", element);
}
}
我似乎无法解决这个问题。 Rust编译器有某种行为吗?
答案 0 :(得分:2)
这两个部分是互补的:
如果索引长度不正确,我们可能会导致程序崩溃。
每次编写# Modules #
import seaborn, pandas, matplotlib
from six import StringIO
################################################################################
def amount_to_offets(amount):
"""A function that takes an amount of overlapping points (e.g. 3)
and returns a list of offsets (jittered) coordinates for each of the
points.
It follows the logic that two points are displayed side by side:
2 -> * *
Three points are organized in a triangle
3 -> *
* *
Four points are sorted into a square, and so on.
4 -> * *
* *
"""
assert isinstance(amount, int)
solutions = {
1: [( 0.0, 0.0)],
2: [(-0.5, 0.0), ( 0.5, 0.0)],
3: [(-0.5, -0.5), ( 0.0, 0.5), ( 0.5, -0.5)],
4: [(-0.5, -0.5), ( 0.5, 0.5), ( 0.5, -0.5), (-0.5, 0.5)],
}
return solutions[amount]
################################################################################
class JitterDotplot(object):
def __init__(self, data, x_col='time', y_col='sex', z_col='tip'):
self.data = data
self.x_col = x_col
self.y_col = y_col
self.z_col = z_col
def plot(self, **kwargs):
# Load data #
self.df = self.data.copy()
# Assign numerical values to the categorical data #
# So that ['Dinner', 'Lunch'] becomes [0, 1] etc. #
self.x_values = self.df[self.x_col].unique()
self.y_values = self.df[self.y_col].unique()
self.x_mapping = dict(zip(self.x_values, range(len(self.x_values))))
self.y_mapping = dict(zip(self.y_values, range(len(self.y_values))))
self.df = self.df.replace({self.x_col: self.x_mapping, self.y_col: self.y_mapping})
# Offset points that are overlapping in the same location #
# So that (2.0, 3.0) becomes (2.05, 2.95) for instance #
cols = [self.x_col, self.y_col]
scaling_factor = 0.05
for values, df_view in self.df.groupby(cols):
offsets = amount_to_offets(len(df_view))
offsets = pandas.DataFrame(offsets, index=df_view.index, columns=cols)
offsets *= scaling_factor
self.df.loc[offsets.index, cols] += offsets
# Plot a standard scatter plot #
g = seaborn.scatterplot(x=self.x_col, y=self.y_col, size=self.z_col, data=self.df, **kwargs)
# Force integer ticks on the x and y axes #
locator = matplotlib.ticker.MaxNLocator(integer=True)
g.xaxis.set_major_locator(locator)
g.yaxis.set_major_locator(locator)
g.grid(False)
# Expand the axis limits for x and y #
margin = 0.4
xmin, xmax, ymin, ymax = g.get_xlim() + g.get_ylim()
g.set_xlim(xmin-margin, xmax+margin)
g.set_ylim(ymin-margin, ymax+margin)
# Replace ticks with the original categorical names #
g.set_xticklabels([''] + list(self.x_mapping.keys()))
g.set_yticklabels([''] + list(self.y_mapping.keys()))
# Return for display in notebooks for instance #
return g
################################################################################
# Graph #
graph = JitterDotplot(data=df)
axes = graph.plot()
axes.figure.savefig('jitter_dotplot.png')
时,标准库都会执行以下操作:
some_slice[some_index]
编译器添加运行时代码以对每个元素执行条件检查
在循环中,结果类似于:
if some_index < some_slice.len() {
some_slice.get_the_value_without_checks(some_index)
} else {
panic!("Hey, stop that");
}
那些重复的条件条件不是最有效的代码。
切片的while some_index < limit {
if some_index < some_slice.len() {
some_slice.get_the_value_without_checks(some_index)
} else {
panic!("Hey, stop that");
}
some_index += 1;
}
的实现利用Iterator
代码来提高效率,但以更复杂的代码为代价。迭代器包含指向数据的原始指针,但要确保您永远不会滥用它们以导致内存不安全。不需要在每个步骤都执行该条件,迭代器解决方案通常经常更快 1 。差不多等于:
unsafe
另请参阅:
1 -作为Matthieu M. points out:
应注意,在
while some_index < limit { some_slice.get_the_value_without_checks(some_index) some_index += 1; }
情况下,优化器可能(也可能无法)删除边界检查。如果成功,则性能是等效的。如果失败,您的代码突然变慢。在微基准测试中,使用简单的代码,更改是否会成功……但是,这可能不会带到您的生产代码中,或者可能会立即生效,并且循环体中的下一个更改将阻止优化,等等。 ,while
循环可能是性能的炸弹。
另请参阅: