python myscript.py
这样做只会使表update testdata.dataset1
set abcd = (select abc
from dataset2
order by random()
limit 1
)
中的一个随机条目填入dataset2
表的所有行。
我需要的是使用从dataset1
表到dataset2
表的随机条目生成每一行。
注意:dataset1
可能大于dataset1
。
答案 0 :(得分:1)
查询1
您应该将abcd
传递到子查询中,以防止"优化"。
UPDATE dataset1
SET abcd = (SELECT abc
FROM dataset2
WHERE abcd = abcd
ORDER BY random()
LIMIT 1
);
查询2
在普通的PostgreSQL上,下面的查询应该更快。
UPDATE dataset1
SET abcd = (SELECT abc
FROM dataset2
WHERE abcd = abcd
OFFSET floor(random()*(SELECT COUNT(*) FROM dataset2))
LIMIT 1
);
但是,正如您所报告的那样,Redshift不是这种情况,它是一个柱状存储。
查询3
在单个查询中从dataset2
获取所有记录比逐个获取记录更有效。我们来测试一下:
UPDATE dataset1 original
SET abcd = fake.abc FROM
(SELECT ROW_NUMBER() OVER(ORDER BY random()) AS id, abc FROM dataset2) AS fake
WHERE original.id % (SELECT COUNT(*) FROM dataset2) = fake.id - 1;
请注意,整数id
列应存在于dataset1
中
此外,对于大于dataset1.id
中的记录数量的dataset2
,abcd
是可预测的。fake_id
&n;
查询4
让我们在dataset1
中创建整数dataset1.fake_id = dataset2.id
列,使用随机值预填充并在UPDATE dataset1
SET fake_id = floor(random()*(SELECT COUNT(*) FROM dataset2)) + 1;
UPDATE dataset1
SET abcd = abc
FROM dataset2
WHERE dataset1.fake_id = dataset2.id;
上执行联接:
fake_id
查询5
如果您不想将dataset1
列添加到fake_id
,请随时计算UPDATE dataset1
SET abcd = abc
FROM (
SELECT with_fake_id.id, dataset2.abc FROM
(SELECT dataset1.id, floor(RANDOM()*(SELECT COUNT(*) FROM dataset2) + 1) AS fake_id FROM dataset1) AS with_fake_id
JOIN dataset2 ON with_fake_id.fake_id = dataset2.id ) AS joined
WHERE dataset1.id = joined.id;
""" 34;:
Sub ConvertTextToNumber()
Dim Area As Range, C As Range
Dim InNumAsStr As String
Set Area = Sheets("all").Range("A1:B10")
For Each C In Area
InNumAsStr = C.Text '*** force the value to be a string type ***
'***swap out any commas with decimal points ***
InNumAsStr = Replace(InNumAsStr, ",", ".")
C.NumberFormat = "0.0"
C.Value = InNumAsStr
Next C
End Sub
<强>性能强>
在普通的PostgreSQL上,查询4似乎是最有效的 我将尝试比较试用版DC1.Large实例的性能。