将python函数转换为Spark(scala),处理临时文件。采集

时间:2019-05-07 17:51:52

标签: python scala apache-spark dataframe apache-spark-sql

我正在转换Python函数以进行雨水计数

def extract_cycles(series):
    """
    Returns two lists: the first one containig full cycles and the second
    containing one-half cycles. The cycles are extracted from the iterable
    *series* according to section 5.4.4 in ASTM E1049 (2011).
    """
    points = deque()
    full, half = [], []

    for x in reversals(series):

        points.append(x)

        while len(points) >= 3:
            # Form ranges X and Y from the three most recent points
            X = abs(points[-2] - points[-1])
            Y = abs(points[-3] - points[-2])

            if X < Y:
                # Read the next point
                break

            elif len(points) == 3:
                # Y contains the starting point
                # Count Y as one-half cycle and discard the first point
                half.append(Y)
                points.popleft()

            else:
                # Count Y as one cycle and discard the peak and the valley of Y
                full.append(Y)

                last = points.pop()

                points.pop()

                points.pop()

                points.append(last)

        else:
            # Count the remaining ranges as one-half cycles
            while len(points) > 1:
                half.append(abs(points[-2] - points[-1]))
                points.pop()
    return full, half

但是,我一直在努力以Spark方式进行这项工作-最初,我认为使用window是可行的,但没有办法保持下一行可以引用的运行总计。< / p>

我应该研究另一种方法吗?似乎遍历行是我唯一的方法,但这违反了Spark的目的。

0 个答案:

没有答案