为什么插入查询会导致我的数据流管道停止处理数据库插入?

时间:2019-04-04 18:04:04

标签: java mysql jdbc google-cloud-dataflow google-cloud-sql

我有一个Google Cloud Dataflow管道(流),该管道全天处理大量插入并将其插入Cloud SQL。我试图用现有数据填充另一个表,但是当我尝试将这些数据插入到这些表中时,它将导致管道停止插入。第二个我停止执行查询,插入继续无问题地插入。

我在数据库上尝试了其他查询,但是只有插入导致它停止处理插入。

我的插入查询是:

INSERT INTO t2 (c1, c2, firstUsed, lastUsed)
SELECT DISTINCT c3, c4, MIN(created), MAX(created) FROM t1 WHERE created 
<= '2018-12-01 23:59:59' GROUP BY c3, c4
ON DUPLICATE KEY UPDATE lastUsed = VALUES(lastUsed)

我在Google Cloud上收到的例外是:

Processing stuck in step ParDo(ProcessDatabaseEvent) for at least 05m00s without outputting or completing in state process
  at java.net.SocketInputStream.socketRead0(Native Method)
  at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
  at java.net.SocketInputStream.read(SocketInputStream.java:170)
  at java.net.SocketInputStream.read(SocketInputStream.java:141)
  at com.mysql.cj.protocol.ReadAheadInputStream.fill(ReadAheadInputStream.java:107)
  at com.mysql.cj.protocol.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:150)
  at com.mysql.cj.protocol.ReadAheadInputStream.read(ReadAheadInputStream.java:180)
  at java.io.FilterInputStream.read(FilterInputStream.java:133)
  at com.mysql.cj.protocol.FullReadInputStream.readFully(FullReadInputStream.java:64)
  at com.mysql.cj.protocol.a.SimplePacketReader.readHeader(SimplePacketReader.java:63)
  at com.mysql.cj.protocol.a.SimplePacketReader.readHeader(SimplePacketReader.java:45)
  at com.mysql.cj.protocol.a.TimeTrackingPacketReader.readHeader(TimeTrackingPacketReader.java:52)
  at com.mysql.cj.protocol.a.TimeTrackingPacketReader.readHeader(TimeTrackingPacketReader.java:41)
  at com.mysql.cj.protocol.a.MultiPacketReader.readHeader(MultiPacketReader.java:54)
  at com.mysql.cj.protocol.a.MultiPacketReader.readHeader(MultiPacketReader.java:44)
  at com.mysql.cj.protocol.a.NativeProtocol.readMessage(NativeProtocol.java:557)
  at com.mysql.cj.protocol.a.NativeProtocol.checkErrorMessage(NativeProtocol.java:735)
  at com.mysql.cj.protocol.a.NativeProtocol.sendCommand(NativeProtocol.java:674)
  at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:966)
  at com.mysql.cj.NativeSession.execSQL(NativeSession.java:1165)
  at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:937)
  at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1116)
  at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1066)
  at com.mysql.cj.jdbc.ClientPreparedStatement.executeLargeUpdate(ClientPreparedStatement.java:1396)
  at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdate(ClientPreparedStatement.java:1051)
  at mypardo.processElement(StatsCollector.java:173)
  at mypardo.invokeProcessElement(Unknown Source)
timestamp 2019-04-04T17:40:01.724Z
logger com.google.cloud.dataflow.worker.DataflowOperationContext
severity WARNING
worker myworker
step ParDo(ProcessDatabaseEvent)
thread 34

任何帮助将不胜感激!

编辑:

添加数据库架构

t1 :(大约一千万条记录)

c1          varchar(255)  NO   PRI NULL    
c2          varchar(255)  NO   PRI NULL     
c3          varchar(255)  YES  NULL    
c4          varchar(255)  YES  NULL    
c5          varchar(255)  YES  NULL    
c6          varchar(255)  YES  NULL    
created     datetime      YES  NULL

t2:

c1         varchar(255) NO   PRI NULL    
c2         varchar(255) NO   PRI NULL    
firstUsed  datetime     YES      NULL    
lastUsed   datetime     YES      NULL    

1 个答案:

答案 0 :(得分:0)

如果您使用分组依据,则不需要区分,并且不应在ON DUPLICATE类别中使用值(lastUsed),而应使用t1.lastUsed

/**
 * @param mixed $id application identifier
 */
public function foo($id)