如何让Mysql查询(具有多个连接)更快更有效

时间:2017-12-04 10:17:38

标签: php mysql performance

执行MySql查询时遇到一个很大问题,这个问题非常慢......太慢......无法使用!

要阅读价格超过20秒的1000种产品!!!

$dati = mysqli_query($mysqli_connect, "
SELECT      *  
FROM        $tb_products
LEFT JOIN   $tb_categories ON $tb_products.product_category = $tb_categories.category_id_master
LEFT JOIN   $tb_subcategories ON $tb_products.product_subcategory = $tb_subcategories.subcategory_id_master
LEFT JOIN   $tb_logos ON $tb_products.product_logo = $tb_logos.logo_id_master

LEFT JOIN   $tb_prices ON ( 
            $tb_products.product_brand = $tb_prices.price_brand
            AND $tb_products.product_code = $tb_prices.price_code
            AND $tb_prices.price_validity = (
                SELECT  MAX($tb_prices.price_validity) 
                FROM    $tb_prices 
                WHERE   $tb_prices.price_validity<=DATE_ADD(CURDATE(), INTERVAL +0 DAY)
                        AND $tb_products.product_code = $tb_prices.price_code
            )
        )

WHERE       $tb_products.product_language='$activeLanguage' AND $tb_products.product_category!=0
GROUP BY    $tb_products.product_code
ORDER BY    $tb_products.product_brand, $tb_categories.category_rank, $tb_subcategories.subcategory_rank, $tb_products.product_subcategory, $tb_products.product_rank
");

修改
正如阿尔瓦罗先生所建议的那样,我改变了SELECT *,使用更高效的SELECT [值列表],执行时间从20秒减少到14秒。还是太慢了...... 结束编辑

每种产品可以有不同的价格,因此我使用(选择最大...)来获取最近(但不是未来)的价格。 也许这个功能减慢了一切?您认为有更好的解决方案吗?

考虑没有连接的相同查询只需0.2秒的价格。 所以我确信问题就在于代码的那一部分。

$dati = mysqli_query($mysqli_connect, "
SELECT      *  
FROM        $tb_products
LEFT JOIN   $tb_categories ON $tb_products.product_category = $tb_categories.category_id_master
LEFT JOIN   $tb_subcategories ON $tb_products.product_subcategory = $tb_subcategories.subcategory_id_master
LEFT JOIN   $tb_logos ON $tb_products.product_logo = $tb_logos.logo_id_master

WHERE       $tb_products.product_language='$activeLanguage' AND $tb_products.product_category!=0
GROUP BY    $tb_products.product_code
ORDER BY    $tb_products.product_brand, $tb_categories.category_rank, $tb_subcategories.subcategory_rank, $tb_products.product_subcategory, $tb_products.product_rank
");

我还考虑过这可能取决于服务器的功能,但我倾向于排除它,因为第二个查询(没有价格)作为速度是完全可以接受的。

价格表如下

+----------------+-------------+
| price_id       | int(3)      |
| price_brand    | varchar(5)  |
| price_code     | varchar(50) |
| price_value    | float(10,2) |
| price_validity | date        |
| price_language | varchar(2)  |
+----------------+-------------+

2 个答案:

答案 0 :(得分:2)

也许是因为你使用SELECT *,这被称为不良做法。在堆栈溢出中检查此问题。

Is there a difference between Select * and Select [list each col]

在那里,Mitch Wheat写道:

  

您应指定显式列列表。 SELECT *将带回更多列,而不是创建更多IO和网络流量,但更重要的是,即使存在非群集覆盖索引(在SQL Server上),也可能需要额外的查找。   块引用

答案 1 :(得分:2)

<强>解决

问题出在上一次与价格表的JOIN中。 根据建议,我设法分别执行SELECT MAX(...),执行时间为0.1秒。

所以我决定在没有价格的情况下运行主查询,然后在WHILE cicle中获取数组,我运行第二个查询来获取每个产品的价格! 这完美地工作,我的页面从20秒下降到十分之几秒。

所以,代码就像这样:

#-*- coding: utf-8 -*-

import numpy as np
from matplotlib import pyplot as plt
from scipy.interpolate import interp1d
import csv

# Read data files and turn them into numpy array for further processing
def read_datafile(file_name):
    data = np.loadtxt(file_name, delimiter=";")
    return data

data1 = read_datafile("testcsv1.csv")
data2 = read_datafile("testcsv2.csv")

# Add empty column at the appropriate position
emptycol1 = np.empty((len(data1), 3))
emptycol1[:] = np.nan
emptycol2 = np.empty((len(data2), 3))
emptycol2[:] = np.nan
emptycol1[:,:-1] = data1
emptycol2[:,[0, 2]] = data2

# Merge and sort the data sets. Create empty array to add final results
merged_temp = np.concatenate((emptycol1, emptycol2))
merged_temp = np.array(sorted(merged_temp, key = lambda x: float(x[0])))
merged = np.empty((1, 3))

# Check for entries where the x values already match. Merge those into one row
i = 0
while i < len(merged_temp)-1:
    if merged_temp[i, 0] == merged_temp[i+1, 0]:
        newrow = np.array([merged_temp[i, 0], merged_temp[i, 1], merged_temp[i+1, 2]])
        merged = np.vstack((merged, newrow))
        i += 2
    else:
        newrow = np.array([merged_temp[i, 0], merged_temp[i, 1], merged_temp[i, 2]])
        merged = np.vstack((merged, newrow))
        i += 1

# Check for so far undefined values (gaps in the data). Interpolate between them (linearly)
for i in range(len(merged)-1):
    # First y column
    if np.isnan(merged[i, 1]) == True:
        # If only one value is missing (maybe not necessary to separate this case)
        if (np.isnan(merged[i-1, 1]) == False) and (np.isnan(merged[i+1, 1]) == False):
            merged[i, 1] = (merged[i-1, 1] + merged[i+1, 1])/2
        # If two or more values are missing
        elif np.isnan(merged[i, 1]) == True:
            l = 0
            while (np.isnan(merged[i+l, 1]) == True) and (i+l != len(merged)-1):
                l += 1
            x1 = np.array([i-1, i+l])                       # endpoints
            x = np.linspace(i, i+l-1, l, endpoint=True)     # missing points
            y = np.array([merged[i-1, 1], merged[i+l, 1]])  # values at endpoints
            f = interp1d(x1, y)                             # linear interpolation
            for k in x:
                merged[k, 1] = f(k)
    # Second y column
    if np.isnan(merged[i, 2]) == True:
        # If only one value is missing
        if (np.isnan(merged[i-1, 2]) == False) and (np.isnan(merged[i+1, 2]) == False):
            merged[i, 2] = (merged[i-1, 2] + merged[i+1, 2])/2
        # If two or more values are missing
        elif np.isnan(merged[i, 2]) == True:
            l = 0
            while (np.isnan(merged[i+l, 2]) == True) and (i+l != len(merged)-1):
                l += 1
            x1 = np.array([i-1, i+l])                       # endpoints
            x = np.linspace(i, i+l-1, l, endpoint=True)     # missing points
            y = np.array([merged[i-1, 2], merged[i+l, 2]])  # values at endpoints
            f = interp1d(x1, y)                             # linear interpolation
            for k in x:
                merged[k, 2] = f(k)

# Remove lines which still have "nan" values (beginning and end). This could be prevented by an extrapolation
merged = merged[~np.isnan(merged).any(axis=1)]
merged = np.delete(merged, (0), axis=0)

# Write table to new csv file in the same directory
with open("testcsv_merged.csv", "w") as mergedfile:
    writer = csv.writer(mergedfile)
    [writer.writerow(r) for r in merged]

然后..

$dati = mysqli_query($mysqli_connect, "
SELECT      *  
FROM        $tb_products
LEFT JOIN   $tb_categories ON $tb_products.product_category =     $tb_categories.category_id_master
LEFT JOIN   $tb_subcategories ON $tb_products.product_subcategory =     $tb_subcategories.subcategory_id_master
LEFT JOIN   $tb_logos ON $tb_products.product_logo = $tb_logos.logo_id_master

WHERE       $tb_products.product_language='$activeLanguage' AND     $tb_products.product_category!=0
GROUP BY    $tb_products.product_code
ORDER BY    $tb_products.product_brand, $tb_categories.category_rank,     $tb_subcategories.subcategory_rank, $tb_products.product_subcategory,     $tb_products.product_rank
");

可能不是最好和最优雅的解决方案,但对我来说工作得很好!