在Elastic Beanstalk上使用Django / Celery的权限问题

时间:2018-05-23 02:43:03

标签: python django celery elastic-beanstalk user-permissions

我的应用程序(clads)在Django上运行,并使用Celery进行定时和异步任务。不幸的是,我似乎无法找出阻止Celery进程写入Django应用程序日志或操纵Django应用程序创建的文件的一些权限问题。 Django应用程序在wsgi进程中运行,我有一些配置文件设置应用程序日志目录,以便wsgi进程可以写入它(见下文)。

然而,似乎芹菜进程作为一个不同的用户运行,没有权限写入这些文件(当它看到日志文件配置时它会自动尝试执行 - 也在下面。注意我试图改变这个作为wsgi运行,但没有工作)。这个相同的权限问题似乎阻止了Celery进程操纵Django应用程序创建的临时文件 - 这是项目的要求。

我确实在Unix类型操作系统上非常生疏,所以我确定我错过了一些简单的事情。我一直在搜索这个网站和其他人几天,虽然我发现很多帖子让我接近这个问题,但我似乎无法解决它。我怀疑我的配置中可能需要一些额外的命令来设置权限或在不同的用户下运行Celery。任何帮助将不胜感激。项目配置和相关代码文件摘录如下。大多数配置文件都是通过此网站和其他网站上的信息拼凑而成的 - 抱歉没有选址,但没有足够密切的记录来确切知道它们来自哪里。

settings.py

的日志和芹菜部分
#log settings
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
    'verbose': {
        'format': '%(asctime)s - %(levelname)s - %(module)s.%(fileName)s.%(funcName)s %(processName)d %(threadName)d: %(message)s',
    },
    'simple': {
        'format': '%(asctime)s - %(levelname)s: %(message)s'
    },
},
'handlers' : {
    'django_log_file': {
        'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
        'class': 'logging.FileHandler',
        'filename': os.environ.get('DJANGO_LOG_FILE'),
        'formatter': 'verbose',
    },
    'app_log_file': {
        'level': os.getenv('CLADS_LOG_LEVEL', 'INFO'),
        'class': 'logging.FileHandler',
        'filename': os.environ.get('CLADS_LOG_FILE'),
        'formatter': 'verbose',
    },
},
'loggers': {
    'django': {
        'handlers': ['django_log_file'],
        'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
        'propagate': True,
    },
    'clads': {
        'handlers': ['app_log_file'],
        'level': os.getenv('CLADS_LOG_LEVEL', 'INFO'),
        'propagate': True,
    },
},
}

WSGI_APPLICATION = 'clads.wsgi.application'

# celery settings
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'

CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
CELERY_SEND_EVENTS = False

CELERY_BROKER_URL = os.environ.get('BROKER_URL')

tasks.py 摘录     LOGGER = logging.getLogger(' clads.pit')

@shared_task(name="archive_pit_file")
def archive_pit_file(tfile_name):
    LOGGER.debug('archive_date_file called for ' + tfile_name)

    LOGGER.debug('connecting to S3 ...')
    s3 = boto3.client('s3')

    file_fname = os.path.join(settings.TEMP_FOLDER, tfile_name)
    LOGGER.debug('reading temp file from ' + file_fname)
    s3.upload_file(file_fname, settings.S3_ARCHIVE, tfile_name)

    LOGGER.debug('cleaning up temp files ...')

    #THIS LINE CAUSES PROBLEMS BECAUSE THE CELERY PROCESS DOES'T HAVE 
    #PERMISSION TO REMOVE TEH WSGI OWNED FILE 
    os.remove(file_fname)

logging.config

commands:
  01_change_permissions:
      command: chmod g+s /opt/python/log
  02_change_owner:
      command: chown root:wsgi /opt/python/log

99_celery.config

container_commands:  
  04_celery_tasks:
    command: "cat .ebextensions/files/celery_configuration.txt > /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && chmod 744 /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
    leader_only: true
 05_celery_tasks_run:
   command: "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
   leader_only: true

celery_configuration.txt

#!/usr/bin/env bash

# Get django environment variables
celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/%/%%/g' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
celeryenv=${celeryenv%?}

# Create celery configuraiton script
celeryconf="[program:celeryd-worker]  
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A clads -b <broker_url> --loglevel=INFO --without-gossip --without-mingle --without-heartbeat

directory=/opt/python/current/app  
user=nobody  
numprocs=1  
stdout_logfile=/var/log/celery-worker.log  
stderr_logfile=/var/log/celery-worker.log  
autostart=true  
autorestart=true  
startsecs=10

; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600

; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true

; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998

environment=$celeryenv

[program:celeryd-beat]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery beat -A clads -b <broker_url> --loglevel=INFO --workdir=/tmp

directory=/opt/python/current/app  
user=nobody  
numprocs=1  
stdout_logfile=/var/log/celery-beat.log  
stderr_logfile=/var/log/celery-beat.log  
autostart=true  
autorestart=true  
startsecs=10

; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600

; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true

; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998

environment=$celeryenv"

# Create the celery supervisord conf script
echo "$celeryconf" | tee /opt/python/etc/celery.conf

# Add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf  
  then
  echo "[include]" | tee -a /opt/python/etc/supervisord.conf
  echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
fi

# Reread the supervisord config
supervisorctl -c /opt/python/etc/supervisord.conf reread

# Update supervisord in cache without restarting all services
supervisorctl -c /opt/python/etc/supervisord.conf update

# Start/Restart celeryd through supervisord
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker  
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-beat  

1 个答案:

答案 0 :(得分:0)

我无法准确找出权限问题,但找到了可能有助于其他人的解决方法。我在日志设置中删除了FileHandler配置,并用StreamHandler替换了它们。这解决了权限问题,因为Celery进程不必尝试访问wsgi用户拥有的日志文件。

来自Web应用程序的日志消息最终出现在httpd错误日志中 - 不太理想,但至少我可以找到它们,并且它们也可以通过弹性beanstalk控制台访问 - 并且Celery日志写入芹菜工作者。 / var / log中的log和celery-beat.log。我无法通过控制台访问这些内容,但可以通过直接登录到实例来访问它们。这不是理想的,因为这些日志不会被轮换,如果实例退役就会丢失,但至少它让我暂时离开。

以下是通过这种方式运行的修改后的日志设置:

@Test
public void test() {
    final Path root = Paths.get("/tmp");
    final Path path0 = Paths.get("/");
    final Path path1 = Paths.get("/opt/location/sub");
    final Path path2 = Paths.get("/tmp/location/sub");

    final Pattern ptrn = Pattern.compile("^[a-zA-Z].*$");

    final Function<Path, String> f = p -> root.relativize(p).toString();

    Assert.assertFalse("root",ptrn.matcher(f.apply(root)).matches());
    Assert.assertFalse("path0",ptrn.matcher(f.apply(path0)).matches());
    Assert.assertFalse("path1",ptrn.matcher(f.apply(path1)).matches());
    Assert.assertTrue("path2",ptrn.matcher(f.apply(path2)).matches());
}