Redshift:使用另一个表中的随机数据更新或插入列中的每一行

时间:2017-08-06 18:02:32

标签: amazon-redshift

python myscript.py

这样做只会使表update testdata.dataset1 set abcd = (select abc from dataset2 order by random() limit 1 ) 中的一个随机条目填入dataset2表的所有行。

我需要的是使用从dataset1表到dataset2表的随机条目生成每一行。

注意:dataset1可能大于dataset1

1 个答案:

答案 0 :(得分:1)

查询1

您应该将abcd传递到子查询中,以防止"优化"。

UPDATE dataset1
    SET abcd = (SELECT abc
                FROM dataset2
                WHERE abcd = abcd
                ORDER BY random()
                LIMIT 1
               );

SQL Fiddle

查询2

在普通的PostgreSQL上,下面的查询应该更快。

UPDATE dataset1
    SET abcd = (SELECT abc
                FROM dataset2
                WHERE abcd = abcd
                OFFSET floor(random()*(SELECT COUNT(*) FROM dataset2))
                LIMIT 1
               );

SQL Fiddle

但是,正如您所报告的那样,Redshift不是这种情况,它是一个柱状存储。

查询3

在单个查询中从dataset2获取所有记录比逐个获取记录更有效。我们来测试一下:

UPDATE dataset1 original
SET abcd = fake.abc FROM 
              (SELECT ROW_NUMBER() OVER(ORDER BY random()) AS id, abc FROM dataset2) AS fake
               WHERE original.id % (SELECT COUNT(*) FROM dataset2) = fake.id - 1;

SQL Fiddle

请注意,整数id列应存在于dataset1中 此外,对于大于dataset1.id中的记录数量的dataset2abcd是可预测的。fake_id&n;

查询4

让我们在dataset1中创建整数dataset1.fake_id = dataset2.id列,使用随机值预填充并在UPDATE dataset1 SET fake_id = floor(random()*(SELECT COUNT(*) FROM dataset2)) + 1; UPDATE dataset1 SET abcd = abc FROM dataset2 WHERE dataset1.fake_id = dataset2.id; 上执行联接:

fake_id

SQL Fiddle

查询5

如果您不想将dataset1列添加到fake_id,请随时计算UPDATE dataset1 SET abcd = abc FROM ( SELECT with_fake_id.id, dataset2.abc FROM (SELECT dataset1.id, floor(RANDOM()*(SELECT COUNT(*) FROM dataset2) + 1) AS fake_id FROM dataset1) AS with_fake_id JOIN dataset2 ON with_fake_id.fake_id = dataset2.id ) AS joined WHERE dataset1.id = joined.id; """ 34;:

Sub ConvertTextToNumber()

  Dim Area As Range, C As Range
  Dim InNumAsStr As String

  Set Area = Sheets("all").Range("A1:B10")

    For Each C In Area
      InNumAsStr = C.Text '*** force the value to be a string type ***
      '***swap out any commas with decimal points ***
      InNumAsStr = Replace(InNumAsStr, ",", ".") 
      C.NumberFormat = "0.0"
      C.Value = InNumAsStr
   Next C

End Sub

SQL Fiddle

<强>性能

在普通的PostgreSQL上,查询4似乎是最有效的 我将尝试比较试用版DC1.Large实例的性能。