我有2个表,其中包含我根据日期列尝试加入的记录。由于日期列的精确性,它们并不完全相同,所以我想出了一种方法,用于将一个表中的记录连接到另一个表中最接近日期的记录(仅当日期在1分钟内时)彼此)。我已经在几个表上成功运行了这个,但是我最近遇到了一些导致Datediff SQL函数导致溢出的数据。
以下是我正在使用的相关数据:
<form name="widgets">
widget model 37AX-L:<br>
<input type="text" id="widg1" name="37AX-L" value=0><br>
widget model 42XR-J:<br>
<input type="text" id="widg2" name="42XR-J" value=0><br>
widget model 93ZZ-A:<br>
<input type="text" id="widg3" name="93ZZ-A" value=0><br>
<br>
<input type="button" name="B1" value="Calculate" onclick="cal()"><br>
</form>
<input type="text" name="total" id="total"> total widgets <br>
<input type="text" name="money" id="money"> total dollars
<script>
function cal(){
var wid1 = document.getElementById("widg1").value;
var wid2 = document.getElementById("widg2").value;
var wid3 = document.getElementById("widg3").value;
var tot = (wid1+wid2+wid3);
var dollars = ((wid1*12.45)+(wid2*15.34)+(wid3*28.99));
document.getElementsByName("total")[0].value=tot;
document.getElementsByName("money")[0].value=dollars;
}
</script>
以下是我用来获取最接近日期的记录的查询:
--------------- #tmp_Job_Queue ---------------
SELECT * INTO #tmp_Job_Queue
FROM (
SELECT N'130' AS [ID], N'Process 1' AS [ProcessName], N'2006-12-28 14:37:24.717' AS [DateCompleted] UNION ALL
SELECT N'133' AS [ID], N'Process 1' AS [ProcessName], N'2007-01-09 15:42:43.500' AS [DateCompleted] UNION ALL
SELECT N'219' AS [ID], N'Process 1' AS [ProcessName], N'2008-01-08 14:52:52.797' AS [DateCompleted] UNION ALL
SELECT N'234' AS [ID], N'Process 1' AS [ProcessName], N'2008-02-15 17:00:40.440' AS [DateCompleted] UNION ALL
SELECT N'278' AS [ID], N'Process 1' AS [ProcessName], N'2008-12-23 11:14:06.420' AS [DateCompleted] UNION ALL
SELECT N'281' AS [ID], N'Process 1' AS [ProcessName], N'2008-12-23 15:14:51.797' AS [DateCompleted] UNION ALL
SELECT N'286' AS [ID], N'Process 1' AS [ProcessName], N'2009-01-21 14:46:16.367' AS [DateCompleted] UNION ALL
SELECT N'288' AS [ID], N'Process 1' AS [ProcessName], N'2009-01-22 10:33:21.150' AS [DateCompleted] UNION ALL
SELECT N'290' AS [ID], N'Process 1' AS [ProcessName], N'2009-01-26 08:18:22.527' AS [DateCompleted] UNION ALL
SELECT N'340' AS [ID], N'Process 1' AS [ProcessName], N'2009-12-30 14:58:17.193' AS [DateCompleted] UNION ALL
SELECT N'349' AS [ID], N'Process 1' AS [ProcessName], N'2010-01-19 12:40:26.190' AS [DateCompleted] UNION ALL
SELECT N'390' AS [ID], N'Process 1' AS [ProcessName], N'2010-12-21 11:25:50.057' AS [DateCompleted] UNION ALL
SELECT N'399' AS [ID], N'Process 1' AS [ProcessName], N'2011-01-25 15:44:59.673' AS [DateCompleted] UNION ALL
SELECT N'440' AS [ID], N'Process 1' AS [ProcessName], N'2011-12-19 08:40:41.547' AS [DateCompleted] UNION ALL
SELECT N'447' AS [ID], N'Process 1' AS [ProcessName], N'2012-01-12 14:15:00.800' AS [DateCompleted] UNION ALL
SELECT N'563' AS [ID], N'Process 1' AS [ProcessName], N'2013-12-19 14:39:39.123' AS [DateCompleted] UNION ALL
SELECT N'569' AS [ID], N'Process 1' AS [ProcessName], N'2014-01-13 11:26:27.007' AS [DateCompleted] UNION ALL
SELECT N'631' AS [ID], N'Process 1' AS [ProcessName], N'2014-12-16 10:07:53.907' AS [DateCompleted] UNION ALL
SELECT N'639' AS [ID], N'Process 1' AS [ProcessName], N'2015-01-08 16:10:50.010' AS [DateCompleted] UNION ALL
SELECT N'689' AS [ID], N'Process 1' AS [ProcessName], N'2015-12-17 13:43:28.687' AS [DateCompleted] UNION ALL
SELECT N'691' AS [ID], N'Process 1' AS [ProcessName], N'2015-12-18 12:15:18.367' AS [DateCompleted] UNION ALL
SELECT N'699' AS [ID], N'Process 1' AS [ProcessName], N'2016-01-12 12:27:09.523' AS [DateCompleted] UNION ALL
SELECT N'794' AS [ID], N'Process 1' AS [ProcessName], N'2017-10-09 14:58:06.503' AS [DateCompleted] UNION ALL
SELECT N'817' AS [ID], N'Process 1' AS [ProcessName], N'2017-10-12 08:54:57.820' AS [DateCompleted] ) t;
--------------- #tmp_Log ---------------
SELECT * INTO #tmp_Log
FROM (
SELECT N'5' AS [ID], N'Process 2' AS [ProcessName], N'2008-02-15 17:00:39.550' AS [CreateDate] UNION ALL
SELECT N'190' AS [ID], N'Process 2' AS [ProcessName], N'2017-10-09 14:58:05.383' AS [CreateDate] UNION ALL
SELECT N'191' AS [ID], N'Process 2' AS [ProcessName], N'2017-10-12 08:54:57.820' AS [CreateDate] UNION ALL
SELECT N'17' AS [ID], N'Process 2' AS [ProcessName], N'2009-01-21 14:46:15.150' AS [CreateDate] UNION ALL
SELECT N'18' AS [ID], N'Process 2' AS [ProcessName], N'2009-01-21 16:24:20.913' AS [CreateDate] UNION ALL
SELECT N'19' AS [ID], N'Process 2' AS [ProcessName], N'2009-01-22 10:33:19.777' AS [CreateDate] UNION ALL
SELECT N'33' AS [ID], N'Process 2' AS [ProcessName], N'2010-01-19 12:40:24.710' AS [CreateDate] UNION ALL
SELECT N'41' AS [ID], N'Process 2' AS [ProcessName], N'2010-12-21 11:25:47.360' AS [CreateDate] UNION ALL
SELECT N'60' AS [ID], N'Process 2' AS [ProcessName], N'2011-12-19 08:40:38.167' AS [CreateDate] UNION ALL
SELECT N'67' AS [ID], N'Process 2' AS [ProcessName], N'2012-01-12 14:14:58.773' AS [CreateDate] UNION ALL
SELECT N'79' AS [ID], N'Process 2' AS [ProcessName], N'2012-12-17 15:49:49.890' AS [CreateDate] UNION ALL
SELECT N'84' AS [ID], N'Process 2' AS [ProcessName], N'2013-01-07 08:57:58.957' AS [CreateDate] UNION ALL
SELECT N'21' AS [ID], N'Process 2' AS [ProcessName], N'2009-01-26 08:18:21.213' AS [CreateDate] UNION ALL
SELECT N'47' AS [ID], N'Process 2' AS [ProcessName], N'2011-01-25 15:44:57.760' AS [CreateDate] UNION ALL
SELECT N'96' AS [ID], N'Process 2' AS [ProcessName], N'2013-12-19 14:39:25.513' AS [CreateDate] UNION ALL
SELECT N'102' AS [ID], N'Process 2' AS [ProcessName], N'2014-01-13 11:26:22.107' AS [CreateDate] UNION ALL
SELECT N'114' AS [ID], N'Process 2' AS [ProcessName], N'2014-12-16 10:07:32.987' AS [CreateDate] UNION ALL
SELECT N'121' AS [ID], N'Process 2' AS [ProcessName], N'2015-01-08 16:10:45.110' AS [CreateDate] UNION ALL
SELECT N'135' AS [ID], N'Process 2' AS [ProcessName], N'2015-12-17 13:43:23.220' AS [CreateDate] UNION ALL
SELECT N'137' AS [ID], N'Process 2' AS [ProcessName], N'2015-12-18 12:15:15.577' AS [CreateDate] UNION ALL
SELECT N'145' AS [ID], N'Process 2' AS [ProcessName], N'2016-01-12 12:27:07.797' AS [CreateDate] ) t;
通常它可以工作,但是通过这组特定的数据,它给了我一个错误。如果我取出WHERE子句,它没有问题,但是一旦我把它重新插入(即使我做了像DECLARE @QProcess VARCHAR(50) = 'Process 1'
DECLARE @LProcess VARCHAR(50) = 'Process 2'
;WITH timeDifferences AS (
SELECT Q.ID AS QueueID, L.ID AS LogID,
ABS(DATEDIFF(MS, L.CreateDate, Q.DateCompleted)) AS DiffInMS
FROM #tmp_Job_Queue AS Q
JOIN #tmp_Log AS L
ON Q.ProcessName = @QProcess AND
L.ProcessName = @LProcess AND
ABS(DATEDIFF(MI, L.CreateDate, Q.DateCompleted)) <= 1
)
SELECT *
FROM timeDifferences AS T1
WHERE DiffInMS = (SELECT MIN(DiffInMS) FROM timeDifferences AS T2 WHERE T2.QueueID = T1.QueueID)
这样简单的事情,它也会开始给出这个错误:
datediff函数导致溢出。分隔两个日期/时间实例的日期部分数量太大。尝试使用具有不太精确的日期部分的datediff。
我可以将其更改为使用一些中间临时表并让它运行:
WHERE DiffInMS = 0
所以它比任何事情更令人烦恼。我假设它与SQL Server如何处理CTE有关。任何人都可以解释为什么会发生这种情况吗?
修改
更具体地说,为什么我删除WHERE子句时没有出现溢出错误,但是重新引入WHERE子句会导致错误?我查找了操作的SQL顺序并确认在SELECT之前评估了JOIN,因此应该除去datediff大于1分钟的行。然后Datediff(MS,....)将不会执行会导致溢出的行。或者至少,我认为它应该如何运作?
此外,我希望在删除WHERE子句时仍然会遇到错误,因为所有行都会被评估,但这似乎并不是正在发生的事情。
答案 0 :(得分:1)
您的查询中唯一可以生成溢出错误的部分是:
import pandas as pd
import glob
path = r'C\filepath\'
allFiles = glob.glob(path + "/*.csv")
df = pd.DataFrame()
list_ = []
for file_ in allFiles:
df = pd.read_csv(file_, index_col=None, header=0)
list_.apped(df)
df1 = pd.concat(list_)
#Add new column from 'A' and 'B' columns
df2 = df1.assign(NewColumnName = df1['ColumnAName'] + '' + df1['ColumnBName'])
#Split another column into two separate columns
df3 = df2.join(df2['ColumnCName'].str.split('-', 1, expand=True).rename(columns={0:'NewColumnC1Name', 1:'NewColumnC2Name'}))
我认为你的日期范围足够大,足以满足任何一个&#34;分钟&#34;差异导致问题。 SQL Server 2016使用File "pandas\_libs\join.pyx", line 123, in pandas._libs.join.left_outer_join
MemorryError
解决了这个问题。很高兴知道这个问题终于得到了解决。
你为什么有时只看到这个?我的猜测是你没有处理所有数据。如果您在外部查询中执行了ABS(DATEDIFF(MS, L.CreateDate, Q.DateCompleted)) AS DiffInMS
,则可能始终会看到问题。它可能埋在深处的某一行。
如果我猜你不关心大差异,你可以用以下内容替换逻辑:
datediff_big()
只有在安全的情况下才会有所作为。此版本返回order by
,但您可以轻松地包含:
(CASE WHEN ABS(DATEDIFF(minute, L.CreateDate, Q.DateCompleted)) < 2000000000 / (60 * 1000)
THEN ABS(DATEDIFF(MS, L.CreateDate, Q.DateCompleted))
ELSE) AS DiffInMS
如果你想要一个上限。
答案 1 :(得分:1)
事实证明,您所获得的查询必须相互检查487行。也许其中一些是CreateDate和DateCompleted之间的巨大毫秒数。但是,其中只有18个不到一分钟,所以我将两个DATEDIFF组合成一个CASE语句,这意味着CTE可以通过消除DATEDIFF大于一分钟之前之前的行来运行查询最后处理SELECT语句。
c=="Split"
你会收到关于消除NULLS的消息警告......据我所知这只是意味着CASE语句产生了很多NULL行。然后从结果集中删除这些行,这就是警告所说的内容。根据#Temp表查询测试它是明智的。