Question

在Slick's documentation中，使用Reactive Streams的示例仅用于读取数据作为DatabasePublisher的一种方式。但是，如果您希望根据插入率将数据库用作接收器和后端，会发生什么？

我已经找了等效的 DatabaseSubscriber ，但它并不存在。所以问题是，如果我有一个来源，请说：

val source = Source(0 to 100)

如何使用Slick创建一个Sink，将这些值写入具有模式的表中：

create table NumberTable (value INT)

Answer 1

串行插入

最简单的方法是在inserts内执行Sink.foreach。

假设您已使用schema code generation并进一步假设您的表名为＆＃34; NumberTable＆＃34;

//Tables file was auto-generated by the schema code generation
import Tables.{Numbertable, NumbertableRow} 

val numberTableDB = Database forConfig "NumberTableConfig"

我们可以编写一个插入函数

def insertIntoDb(num : Int) = 
  numberTableDB run (Numbertable += NumbertableRow(num))

该功能可以放在接收器

中

val insertSink = Sink[Int] foreach insertIntoDb

Source(0 to 100) runWith insertSink

批量插入

您可以通过一次批量处理N个插入来进一步扩展Sink方法：

def batchInsertIntoDb(nums : Seq[Int]) = 
  numberTableDB run (Numbertable ++= nums.map(NumbertableRow.apply))

val batchInsertSink = Sink[Seq[Int]] foreach batchInsertIntoDb

这个批量接收器可以由进行批量分组的Flow提供：

val batchSize = 10

Source(0 to 100).via(Flow[Int].grouped(batchSize))
                .runWith(batchInsertSink)

Answer 2

虽然您可以使用Sink.foreach来实现此目标（如Ramon所述），但使用mapAsync Flow更安全且可能更快（通过并行运行插件）。使用Sink.foreach时将遇到的问题是它没有返回值。通过光滑db.run方法插入数据库会返回Future，然后Future[Done]将退出已返回的Sink.foreach，这在implicit val system = ActorSystem("system") implicit val materializer = ActorMaterializer() class Numbers(tag: Tag) extends Table[Int](tag, "NumberTable") { def value = column[Int]("value") def * = value } val numbers = TableQuery[Numbers] val db = Database.forConfig("postgres") Await.result(db.run(numbers.schema.create), Duration.Inf) val streamFuture: Future[Done] = Source(0 to 100) .runWith(Sink.foreach[Int] { (i: Int) => db.run(numbers += i).foreach(_ => println(s"stream 1 insert $i done")) }) Await.result(streamFuture, Duration.Inf) println("stream 1 done") //// sample 1 output: //// // stream 1 insert 1 done // ... // stream 1 insert 99 done // stream 1 done <-- stream Future[Done] returned before inserts finished // stream 1 insert 100 done完成后立即完成。

def mapAsync[T](parallelism: Int)(f: Out ⇒ Future[T])

另一方面，Flow i => db.run(numbers += i)允许您通过并行参数运算并行运行插入，并接受从上游输出值到某种类型的未来的函数。这符合我们的Flow功能。关于这个Futures的好处是它然后将这些val streamFuture2: Future[Done] = Source(0 to 100) .mapAsync(1) { (i: Int) => db.run(numbers += i).map { r => println(s"stream 2 insert $i done"); r } } .runWith(Sink.ignore) Await.result(streamFuture2, Duration.Inf) println("stream 2 done") //// sample 2 output: //// // stream 2 insert 1 done // ... // stream 2 insert 100 done // stream 1 done <-- stream Future[Done] returned after inserts finished的结果提供给下游。

Future[Done]

为证明这一点，您甚至可以从流中返回实际结果，而不是val streamFuture3: Future[Int] = Source(0 to 100) .via(Flow[Int].grouped(10)) // Batch in size 10 .mapAsync(2)((ints: Seq[Int]) => db.run(numbers ++= ints).map(_.getOrElse(0))) // Insert batches in parallel, return insert count .runWith(Sink.fold(0)(_+_)) // count all inserts and return total val rowsInserted = Await.result(streamFuture3, Duration.Inf) println(s"stream 3 done, inserted $rowsInserted rows") // sample 3 output: // stream 3 done, inserted 101 rows（完成代表单位）。此流还将添加更高的并行度值并批处理以获得额外的性能。 *

df1

注意：对于如此小的数据集，您可能看不到更好的性能，但是当我处理1.7M插入时，我能够在批量大小为1000并且并行度值的机器上获得最佳性能8，本地与postgresql。这大约是没有并行运行的两倍。与处理性能一样，您的结果可能会有所不同，您应该自己测量。

Answer 3

我发现Alpakka的文档非常出色，并且使用DSL可以很容易地使用响应流。

这是Slick的文档：https://doc.akka.io/docs/alpakka/current/slick.html

插入示例：

Source(0 to 100)
    .runWith(
      // add an optional first argument to specify the parallelism factor (Int)
      Slick.sink(value => sqlu"INSERT INTO NumberTable VALUES(${value})")
    )

如何在Slick中使用反应流来插入数据

3 个答案: