如何优化laravel上的大数据处理?

时间:2016-04-29 06:16:51

标签: php laravel laravel-5 laravel-5.2 bigdata

我的任务是: "获取交易表,按交易日期分组并计算状态。这种操作将形成统计数据,将在页面上呈现"。

这是我统计数据生成的方法

public static function getStatistics(Website $website = null)
{
    if($website == null) return [];

    $query = \DB::table('transactions')->where("website_id", $website->id)->orderBy("dt", "desc")->get();

    $transitions = collect(static::convertDate($query))->groupBy("dt");
    $statistics = collect();

    dd($transitions);

    foreach ($transitions as $date => $trans) {
        $subscriptions = $trans->where("status", 'subscribe')->count();
        $unsubscriptions = $trans->where("status", 'unsubscribe')->count();
        $prolongations = $trans->where("status", 'rebilling')->count();
        $redirections = $trans->where("status", 'redirect_to_lp')->count();
        $conversion = $redirections == 0 ? 0 : ((float) ($subscriptions / $redirections));
        $earnings = $trans->sum("pay");

        $statistics->push((object)[
            "date" => $date,
            "subscriptions" => $subscriptions,
            'unsubscriptions' => $unsubscriptions,
            'prolongations' => $prolongations,
            'redirections' => $redirections,
            'conversion' => round($conversion, 2),
            'earnings' => $earnings,
        ]);

    }

    return $statistics;
}

如果交易行的数量低于100,000 - 它们都是正确的。但是,如果计数超过150-200k - nginx抛出502坏网关。你有什么建议给我的?我在bigdata处理方面没有任何过关。可能是,我的实力有根本性的错误?

2 个答案:

答案 0 :(得分:3)

大数据绝非易事,但我建议使用Laravel chunk代替get

https://laravel.com/docs/5.1/eloquent(ctrl + f" :: chunk")

::chunk做的是一次选择 n 行,并允许您一点一点地处理它们。这很方便,因为它允许您将更新流式传输到浏览器,但在〜150k结果范围内,我建议查找如何将此工作推送到后台进程,而不是根据请求处理它。

答案 1 :(得分:1)

因此。经过几天关于这个问题的学习信息,我发现只有一个正确的答案:

不使用PHP处理原始数据。最好使用SQL!

就我而言,我们使用的是PostgreSQL。

下面,我将编写帮助我的sql-query,也许它会帮助其他人。

WITH
        cte_range(dt) AS
        (
            SELECT
                generate_series('2016-04-01 00:00:00'::timestamp with time zone, '{$date} 00:00:00'::timestamp with time zone, INTERVAL '1 day')
        ),

        cte_data AS
        (
            SELECT
                date_trunc('day', dt) AS dt,
                COUNT(*) FILTER (WHERE status = 'subscribe') AS count_subscribes,
                COUNT(*) FILTER (WHERE status = 'unsubscribe') AS count_unsubscribes,
                COUNT(*) FILTER (WHERE status = 'rebilling') AS count_rebillings,
                COUNT(*) FILTER (WHERE status = 'redirect_to_lp') AS count_redirects_to_lp,
                SUM(pay) AS earnings,
                CASE
                    WHEN COUNT(*) FILTER (WHERE status = 'redirect_to_lp') > 0 THEN 100.0 * COUNT(*) FILTER (WHERE status = 'subscribe')::float / COUNT(*) FILTER (WHERE status = 'redirect_to_lp')::float
                    ELSE 0
                END
                AS conversion_percent

            FROM
                transactions

            WHERE
                website_id = {$website->id}

            GROUP BY
                date_trunc('day', dt)
        )

        SELECT
            to_char(cte_range.dt, 'YYYY-MM-DD') AS day,
            COALESCE(cte_data.count_subscribes, 0) AS count_subscribe,
            COALESCE(cte_data.count_unsubscribes, 0) AS count_unsubscribes,
            COALESCE(cte_data.count_rebillings, 0) AS count_rebillings,
            COALESCE(cte_data.count_redirects_to_lp, 0) AS count_redirects_to_lp,
            COALESCE(cte_data.conversion_percent, 0) AS conversion_percent,
            COALESCE(cte_data.earnings, 0) AS earnings

        FROM
            cte_range

        LEFT JOIN
            cte_data
            ON cte_data.dt = cte_range.dt

        ORDER BY
            cte_range.dt DESC