如何为执行程序启用JVM远程调试程序?

时间:2018-01-17 18:37:15

标签: apache-spark spark-streaming

我正在尝试设置JVM远程调试器,以便在kafka使用者上设置断点,以识别连接到Kafka代理的问题。

我已经检查了2.2 ......

git clone https://github.com/apache/spark
git checkout branch-2.2

我已经设置了spark root pom.xml scala maven插件来生成调试符号:

    <plugin>
      <groupId>net.alchim31.maven</groupId>
      <artifactId>scala-maven-plugin</artifactId>
      ...
        <javacGenerateDebugSymbols>true</javacGenerateDebugSymbols>
        ...

......然后我用......构建了

mvn -DskipTests clean package

...并与...一起运行

./bin/spark-shell --master local[1] \
  --jars external/kafka-0-10-sql/target/spark-sql-kafka-0-10_2.11-2.2.2-SNAPSHOT.jar,external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.2-SNAPSHOT.jar \
  --conf "spark.executor.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=n,address=localhost:5005,suspend=y" \
  --num-executors 1 \
  --executor-cores 1

我在IntelliJ ...

中设置了远程调试配置
  • 传输:套接字
  • 调试器模式:收听
  • 端口: 5005

我在org.apache.spark.sql.kafka010.KafkaSourceProvider ...

上添加了一个断点

我还更新了日志记录语句,以确保此代码正在运行:

enter image description here

在spark-shell启动后,我运行以下内容......

sc.setLogLevel("DEBUG")

import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.appName("StreamingRetailTransactions").config("master", "local").getOrCreate()

import spark.implicits._

val df = spark.readStream.
                format("kafka").
                option("kafka.bootstrap.servers", "0.0.0.0:9093").
                option("subscribe", "transactions_load").
                option("security.protocol", "SASL_PLAINTEXT").
                option("sasl.mechanism", "PLAIN").
                option("auto.offset.reset","earliest").
                option("group.id", System.currentTimeMillis).
                load()

val query = df.writeStream.format("console").start()

日志输出显示我的更新调试语句。它还显示了在执行程序上运行的代码。但是,我的断点没有被触发......

8/01/17 18:28:33 DEBUG KafkaSourceProvider: executor: ** Set 
key.deserializer to org.apache.kafka.common.serialization.ByteArrayDeserializer, earlier value: 

您可以看到IntelliJ正在侦听连接:

enter image description here

我显然在某个地方错过了一步 - 任何想法?

1 个答案:

答案 0 :(得分:1)

我还必须将调试选项添加到驱动程序中:

./bin/spark-shell --master local[1] \
  --jars external/kafka-0-10-sql/target/spark-sql-kafka-0-10_2.11-2.2.2-SNAPSHOT.jar,external/kafka-0-10-assembly/target/spark-streaming-kafka-0-10-assembly_2.11-2.2.2-SNAPSHOT.jar \
  --conf "spark.executor.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=n,address=localhost:5005,suspend=y"   \
  --num-executors 1 \
  --executor-cores 1 \
  --driver-java-options -agentlib:jdwp=transport=dt_socket,server=n,suspend=y,address=5005