pyspark-内部和公共图书馆的火花提交

时间:2020-10-28 20:05:28

标签: python apache-spark pyspark apache-spark-sql amazon-emr

我正在尝试运行以下命令以使用自己的图书馆和公共图书馆(例如pandas和numpy)在EMR上提交pyspark作业:

$n = count($events);
    for ($i = 0; $i <= $n;) {
      if (isset($events[$i])) {
          $eventArray[$i] = ([
            'uid' => $events[$i] - > getId(),
            'title' => $events[$i] - > getSubject(),
            'end' => Carbon::parse($events[$i] - > getEnd() - > getDateTime()) - > timezone('Europe/Amsterdam') - > toIso8601String(),
            'start' => Carbon::parse($events[$i] - > getStart() - > getDateTime()) - > timezone('Europe/Amsterdam') - > toIso8601String(),
            'description' => $events[$i] - > getBody() - > setContentType('text') - > getContent(),
            'organizer' => $events[$i] - > getOrganizer() - > getEmailAddress() - > getName(),
            'attendees' => array(),
        ]);
    }
    if (isset($events[$i])) {
        $z = count($events[$i] - > getAttendees());

        for ($j = 0; $j <= $z; $j++) {
            if (isset($events[$i] - > getAttendees()[$j]['emailAddress']['name'])) {
                $eventArray[$i]['attendees'][] = ($events[$i] - > getAttendees()[$j]['emailAddress']['name']);
            } else {
                $i++;
            }
        }
    } else {
        $i++;
    }
}

echo wp_json_encode(['events' => $eventArray]);

dependency.zip具有pandas库及其依赖关系,以及我的内部库(xpto库):

spark-submit --deploy-mode client --py-files dependencies.zip main.py

我开始工作时收到错误消息:- dependencies: - pandas - numpy - ... - xpto - __init__.py - xpto.py

那么,我如何在no module named 'pandas'内传递自己的库和公共库来运行作业?

0 个答案:

没有答案