如何在AWS EMR笔记本上安装python库?

时间:2020-03-17 10:20:03

标签: amazon-web-services jupyter-notebook jupyter amazon-emr jupyter-lab

我想在AWS Notebook(连接到EMR集群)上安装其他库,但是看不到从Notebook连接到Internet的任何选项。如果我执行“ pip install”,它总是会回来说网络无法访问。我不确定为网络连接和库安装需要更改哪个网络。

我确实登录了Jupyter终端,并ping到了刚刚超时的google.com。我在“笔记本”部分下没有看到任何网络/安全组等配置,无法进行任何相关更改。

我可能需要采取一些其他措施吗?

1 个答案:

答案 0 :(得分:1)

如果您使用PySpark内核,则可以使用

def create(self, validated_data):
        """ Function to create booking objects \
        and allocate drivers automatically. """

        validated_data['client'] = self.context['request'].user

        """ Variable to save all drivers (querysets) from the driver \
        table ordered by the booking date and time. --> maybe there is \
        a better way to do it. Avoid using .get()"""

        drivers = Driver.objects.filter(active=True).order_by(
            '-booking__booking_date', '-booking__booking_time').all()

        """ Check whether the drivers querysets (list) exists """
        if drivers.exists():

            """ For loop to isolate a single query set from list of \
            quersets (drivers) """
            for drv in drivers:

                """ Condition to check for inner join query between \
                driver and booking table carefully filtering them using \
                booking_date and booking_time. This code is helped by \
                the clean() function in models. Which after every active=True \
                driver is allocated a booking, raises a ValidationError. \

                It is subject to be made better. Trying to find out how it \
                will throw a HttpResponse error instead."""

                if Booking.objects.select_related('driver').filter(
                    booking_date=validated_data['booking_date'],
                    booking_time=validated_data['booking_time'],
                ).annotate(drv=F('driver__user_ptr')).exists():
                    continue
            try:
                return Booking.objects.create(driver=drv, **validated_data)
            except Booking.DoesNotExist:
                pass

或通过运行

sc.install_pypi_package("celery")

following document有更多详细信息

如果您使用python 3内核,则仅安装the packages,除了将python软件包上传到笔记本然后使用jupyterlab终端运行

之外,没有直接方法来安装额外的库。

pip install package.tar.gz