如何从表记录和分数中获取X个连续天数和值的间隔

时间:2018-02-03 19:45:50

标签: php mysql sql intervals

我试图从两个表中得到X个连续日期(间隔)的平均得分。有了这个,我的意思是日期必须是连续的,基于列records.status的值(仅当状态为T或P时,选择行值,特别是score.score)。

例如,如果我为personid = 133 *选择4个连续日期的间隔,我想返回以下内容(预平均计算,我认为我应该通过SQL查询获得?)

2015-07-11  5
2015-10-17  2
2015-11-06  5
2016-01-20  5

2016-01-30  4
2016-05-19  4
2016–09-07  1   
2016-09-28  3

2016-12-29  2
2017-01-17  1
2017-01-22  3
2017-04-02  2

并绘制图表(平均计算后,我认为我需要用PHP做)

group 1 (2015-07-11 / 2016-01-20) 4.25
group 2 (2016-01-30 / 2016-09-28) 3.00
group 3 (2016-12-29 / 2017-04-02) 2.00

*这是我随机生成的一些示例数据,我正在测试,但我的实际数据更大,结构更好,有更多列和真正连续的日期(mo-fri,彼此之后的几天)。

http://sqlfiddle.com/#!9/4b7a62/1

任何提示和建议都非常受欢迎。

MySQL版本:5.6.26 [edit1]不知怎的,我的sqlfiddle片段是离线的,但这应该是我的样本设置

————2 DB tables schema’s
CREATE TABLE IF NOT EXISTS `records` (
  `person` varchar(32) NOT NULL,
  `status` varchar(32) NOT NULL,
  `purdate` date NOT NULL,
  `personid` int(11) DEFAULT NULL,
  `id` int(11) NOT NULL
)

CREATE TABLE IF NOT EXISTS `scores` (
  `personid` int(11) DEFAULT NULL,
  `score` int(11) DEFAULT NULL,
  `date` date DEFAULT NULL,
  `id` int(11) NOT NULL
) 

—-php for sample data—
function getRandomDateTime($startDate, $endDate, $num) {

    for ($i = 0; $i < $num; $i++) {
        $dateArr[] = date('Y-m-d', mt_rand(strtotime($startDate), strtotime($endDate)));       
    }
    sort($dateArr, SORTDATE);// SORT_REGULAR);SORTDATE);//
    return $dateArr;

}
$test = getRandomDateTime('2015-06-03', '2017-05-12', 100);

echo "insert into records (person, status, purdate, personID) values\r\n";
foreach($test as $value) {
    $arrCode = ['P','L','T'];
    $arrId = [133, 145,156];
   $rand = $arrCode[array_rand($arrCode, 1)];
   $randID = $arrId[array_rand($arrId, 1)];
    echo "('person_name', '".$rand."', '".$value."', '".$randID."'),\r\n";
}

echo "insert into scores (personID, score) values\r\n";
for ($i=0;$i < 100;$i++) {
    $arrId = [133, 145,156];
    $randID = $arrId[array_rand($arrId, 1)];
    echo "('".$randID."','".rand(1,5)."'),\r\n";

}

——— SQL Query To Update The Date Column—
UPDATE scores  
SET scores.date = (  
SELECT records.purdate  
    FROM records  
    WHERE records.id = scores.id  
);

[edit2]这个简单的php函数,我称之为: getConsecutiveInterval(4)。

  function getConsecutiveInterval($interval) {
    global $conn;

//    $interval = 4;
    $offset = '';
// For loop will control the results sets divided by 4 
    for ($i = 1; $i <= $interval; $i++) {
        // To add the offset after the first set
        if ($offset > 0) {
            $limitValues = $interval . ", " . $offset . " ";
        } else {
            $limitValues = $interval;
        }
// Query is the same and at the end of it you include LIMIT to be controlled by the loop.
        $q = "SELECT a.purdate, b.score, a.status "
                . "FROM records a "
                . "INNER JOIN scores2 b "
                . "ON a.purdate = b.date AND a.personid = b.personid "
                . "WHERE a.personid = 133 AND a.status IN('P','T') "
                . "ORDER BY purdate ASC, score DESC ";
        $sqlquery = $q . "  LIMIT " . $limitValues;
        $avg = 0;
        $total = 0;
    //Total Found Use To Divide by ... For Max Loop
    $result = mysqli_query($conn, $q);
    $num_rows = mysqli_num_rows($result);
    //end
        foreach (mysqli_query($conn, $sqlquery) as $results) {
            // Do Something
            $total += $results['score'];
            $avg = $total / $interval;
        }
        echo $avg . '<br/>';
        $offset += $interval;
    } echo '<hr/>';
}

我知道随机数据输出的平均值不同但是基于我的以下随机数据和硬编码的人员= 133

based on my random sample data the query returns

我使用PHP函数获得的平均输出 averages outputs

我希望平均值为2.75, 3.5 and 3.5(based on the rest 2 dates not 4)

当我使用getConsecutiveInterval(3)时;我希望平均值为3.33, 3.33, 2.66 and 4 (based on 1 date)

3 个答案:

答案 0 :(得分:1)

<强>更新 我之前给你的例子,帮助我了解你的需求和你的背景知识(你喜欢开发什么)。

我知道PHP解决方案最适合您,但您知道并非所有MySQL解决方案都应该在PHP上进行转发。所以,我决定采用我能想到的最佳方法。

我已经获得了您通过PHP提供的示例,并且他们已经足够好,可以更好地了解您正在使用的数据类型。

从这些样本中我看到records.purdate和scores.date是相同的,你基本上将purdate列复制到scores.date列。这可能是一种冗余,但它可以帮助我们获得每个连续日期的开始日期和结束日期。

我首先需要提一下,我正在研究MySQL v5.7并使用MySQL Workbench 6.3作为IDE(自从我使用phpMyAdmin以来已经很长时间了,但它应该可以使用它也)。

如果你不知道如何在phpMyAdmin中管理它,你需要创建一个存储过程,只需谷歌吧。

我将为您提供一个有效的(已测试):

CREATE PROCEDURE `getConsecutiveInterval`(IN `selectRows` INT, IN `skippedRows` INT)
BEGIN
SET @selectRows = selectRows; 
SET @skippedRows = skippedRows; 

IF skippedRows = 0 THEN
SET @skippedRows = "";
ELSE 
SET @skippedRows = CONCAT(" , " , skippedRows);
END IF;

SET @q = CONCAT("SELECT concat(date_format(MIN(StartDate), '%Y-%m-%d'), '  /  ', date_format(MAX(EndDate), '%Y-%m-%d')) AS Dates, AVG(Score)
FROM (
SELECT 
    a.purdate AS StartDate, 
    b.date AS EndDate, 
    b.score  AS Score
FROM records a 
LEFT JOIN scores b 
ON a.purdate = b.date AND a.personid = b.personid 
WHERE 
    a.personid = 133
AND a.status IN('P','T') 
AND b.score IS NOT NULL
ORDER BY purdate ASC, score DESC 
LIMIT ", @selectRows, @skippedRows, " ", ") D;");

PREPARE ConsecutiveInterval FROM @q;
EXECUTE ConsecutiveInterval;
DEALLOCATE PREPARE ConsecutiveInterval;
END

此存储过程与getConsecutiveInterval()函数类似,但MySQL中除外。

工作原理: 您可以通过

调用存储过程
CALL getConsecutiveInterval(selectRows,skippedRows)

我在存储过程中创建了一个条件,如果skippedRows为0,那么它将是一个空字符串。否则,将始终返回skippedRows。

例如,使用您提供的样本:

CALL getConsecutiveInterval(4,0)

将返回:

'2015-07-11  /  2016-01-20', '4.25'

AND

CALL getConsecutiveInterval(4,1)

将返回

2016-01-30  /  2016-01-30   4.00

等等。

selectRows var是PHP中的$ interval,skippedRows是$ offset。

然后,从PHP端,您可以通过以下方式调用它:

$query = "CALL getConsecutiveInterval( " . $interval . " , "  . $offset .")";

这样,你只能用PHP上的输出控制$ interval和$ offset整数,其余的将由MySQL本身维护。

$ offset计算将与之前相同:

$offset += $interval;

您还可以使用更多参数(例如personid,status ..etc)更改要扩展的存储过程。无论您需要什么参数,您都可以随时扩展它。

例如,我将使用personid扩展它:

CREATE PROCEDURE `getConsecutiveInterval`(IN `selectRows` INT, IN `skippedRows` INT, IN personID INT)
BEGIN
SET @selectRows = selectRows; 
SET @skippedRows = skippedRows; 
SET @personid = personID;

IF skippedRows = 0 THEN
SET @skippedRows = "";
ELSE 
SET @skippedRows = CONCAT(" , " , skippedRows);
END IF;

IF personID > 0 THEN 
SET @personid = CONCAT(" AND a.personid = ",  personID); 
ELSE 
SET @personid = ""; 
END IF;

SET @q = CONCAT("SELECT concat(date_format(MIN(StartDate), '%Y-%m-%d'), '  /  ', date_format(MAX(EndDate), '%Y-%m-%d')) AS Dates, AVG(Score)
FROM (
SELECT 
    a.purdate AS StartDate, 
    b.date AS EndDate, 
    b.score  AS Score
FROM records a 
LEFT JOIN scores b 
ON a.purdate = b.date AND a.personid = b.personid 
WHERE 
    a.status IN('P','T') 
AND b.score IS NOT NULL ", @personid, " ORDER BY purdate ASC, score DESC LIMIT ", @selectRows, @skippedRows, " ", ") D;");

PREPARE ConsecutiveInterval FROM @q;
EXECUTE ConsecutiveInterval;
DEALLOCATE PREPARE ConsecutiveInterval;
END

这将添加另一个要调用的参数:

CALL getConsecutiveInterval(4,0, 133);

133是personid,如果我将其更改为0,则条件为a.personid = 133

将从查询中删除,我将根据表格排序获取随机数据。

我希望此更新可以帮助您完成旅程。

答案 1 :(得分:0)

做了一个测试例子:

declare @selectedIntervalCount int=3, @selectedID int=1
declare @startDate date='2018-01-01',@endDate date='2018-01-31'
declare @data table(pID int,pDate date, statsValue int)
insert into @data(pID,pDate, statsValue)
values(1,'2018-01-01',1)
,(1,'2018-01-02',2),(1,'2018-01-03',3),(1,'2018-01-04',4)
,(1,'2018-01-05',5),(1,'2018-01-06',1),(1,'2018-01-07',2)
,(1,'2018-01-08',7),(1,'2018-01-09',4),(1,'2018-01-10',3)
,(1,'2018-01-11',8),(1,'2018-01-12',5),(1,'2018-01-13',3)


select tt1.tempIX/@selectedIntervalCount 'intervalIX', cast(min(tt1.pDate) as varchar)+' - '+cast(max(tt1.pDate) as varchar) 'interval', sum(tt1.statsValue)/cast(count(tt1.statsValue) as float) 'avgStatsValue' from( 
    select (row_number() over (order by pDate) -1) 'tempIX', t1.pDate, t1.statsValue 
    from @data t1
    where t1.pDate between @startDate and @endDate 
    and t1.pID=@selectedID
) tt1
group by tt1.tempIX/@selectedIntervalCount
order by tt1.tempIX/@selectedIntervalCount

输出是:

intervalIX  interval    avgStatsValue
0   2018-01-01 - 2018-01-03 2
1   2018-01-04 - 2018-01-06 3,33333333333333
2   2018-01-07 - 2018-01-09 4,33333333333333
3   2018-01-10 - 2018-01-12 5,33333333333333
4   2018-01-13 - 2018-01-13 3

答案 2 :(得分:0)

如果您的mysql版本提供了窗口函数,那么解决方案似乎相当简单。

select g.personid, min(g.date) dfrom, max(g.date) dto, avg(g.score) avgscore
  from (
    select s.*, floor((s.rn - 1) / 4) gn
      from (
        select scores.personid, scores.date, scores.score ,
           row_number() over (
             partition by scores.personid
             order by scores.date) as rn
         from scores
         join records
           on  scores.personid = records.personid
           and scores.date = records.purdate
         where records.status in ('T','P')
         order by personid, date
      ) s
  ) g
  group by g.personid, g.gn
  order by g.personid, g.gn;

使用来自sql小提琴的数据,这给出了:

+----------+------------+------------+----------+
| personid | dfrom      | dto        | avgscore |
+----------+------------+------------+----------+
|      133 | 2015-07-11 | 2016-01-20 |   4.2500 |
|      133 | 2016-01-30 | 2016-09-28 |   3.0000 |
|      133 | 2016-10-02 | 2017-04-02 |   2.0000 |
|      145 | 2015-06-29 | 2016-06-30 |   3.0000 |
|      145 | 2016-10-24 | 2017-01-16 |   3.3333 |
|      156 | 2015-10-20 | 2015-12-17 |   2.0000 |
|      156 | 2015-12-19 | 2016-05-21 |   3.0000 |
|      156 | 2016-05-25 | 2016-10-16 |   4.7500 |
|      156 | 2017-01-30 | 2017-01-30 |   4.0000 |
+----------+------------+------------+----------+