HTCondor作业未运行

时间:2015-08-26 17:05:13

标签: condor

我无法让HTCondor完成我的工作。我一直在攻击这个,我正在尝试随机的事情,所以我认为我应该寻求指导。

我从Ubuntu 15.04上的website安装了HTCondor 8.2.9。以下是有关我的系统的以下信息。

$ cat /etc/condor/condor_config.local
#
# Local Condor Config
#

CONDOR_HOST = aidan-laptop
DAEMON_LIST = MASTER, STARTD, SCHEDD, COLLECTOR, NEGOTIATOR
#FLOCK_TO = aidan-laptop
FLOCK_FROM = aidan-laptop localhost

我当前的主机名

$ hostname
aidan-laptop

我定义的主机

$ cat /etc/hosts
127.0.0.1   localhost
127.0.1.1   aidan-laptop

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

我目前的状态

$ condor_status
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@aidan-laptop LINUX      X86_64 Unclaimed Idle      0.090 1976  0+00:04:39
slot2@aidan-laptop LINUX      X86_64 Unclaimed Idle      0.000 1976  0+00:05:05
slot3@aidan-laptop LINUX      X86_64 Unclaimed Idle      0.000 1976  0+00:05:06
slot4@aidan-laptop LINUX      X86_64 Unclaimed Idle      0.000 1976  0+00:05:07
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX     4     0       0         4       0          0        0

               Total     4     0       0         4       0          0        0

看一下队列

$ condor_q


-- Submitter: aidan-laptop : <192.168.1.151:39444> : aidan-laptop
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   1.0   aidan           8/26 09:27   0+00:00:00 I  0   0.0  hello.sh          
   1.1   aidan           8/26 09:27   0+00:00:00 I  0   0.0  hello.sh          
   1.2   aidan           8/26 09:27   0+00:00:00 I  0   0.0  hello.sh          

3 jobs; 0 completed, 0 removed, 3 idle, 0 running, 0 held, 0 suspended
$ date
Wed Aug 26 09:52:33 PDT 2015
$ lsb_release -r
Release:    15.04

尝试分析作业挂起然后打印和错误

$ date; condor_q -pool 1.00 -analyze; date
Wed Aug 26 09:58:01 PDT 2015
Error:  Could not fetch startd ads
Wed Aug 26 09:59:01 PDT 2015

我的StartLog从停止开始,

$ sudo service condor stop
$ sudo rm /var/log/condor/StartLog
$ date; sudo service condor start
Wed Aug 26 10:01:02 PDT 2015
$ sleep 1m; date; condor_status
Wed Aug 26 10:02:19 PDT 2015
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@aidan-laptop LINUX      X86_64 Unclaimed Idle      0.160 1976  0+00:00:04
slot2@aidan-laptop LINUX      X86_64 Unclaimed Idle      0.000 1976  0+00:00:31
slot3@aidan-laptop LINUX      X86_64 Unclaimed Idle      0.000 1976  0+00:00:32
slot4@aidan-laptop LINUX      X86_64 Unclaimed Idle      0.000 1976  0+00:00:33
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX     4     0       0         4       0          0        0

               Total     4     0       0         4       0          0        0
$ date; cat /var/log/condor/StartLog
Wed Aug 26 10:02:35 PDT 2015
08/26/15 10:01:03 ******************************************************
08/26/15 10:01:03 ** condor_startd (CONDOR_STARTD) STARTING UP
08/26/15 10:01:03 ** /usr/sbin/condor_startd
08/26/15 10:01:03 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1)
08/26/15 10:01:03 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
08/26/15 10:01:03 ** $CondorVersion: 8.2.9 Aug 12 2015 BuildID: 335399 $
08/26/15 10:01:03 ** $CondorPlatform: x86_64_Ubuntu14 $
08/26/15 10:01:03 ** PID = 2487
08/26/15 10:01:03 ** Log last touched time unavailable (No such file or directory)
08/26/15 10:01:03 ******************************************************
08/26/15 10:01:03 Using config source: /etc/condor/condor_config
08/26/15 10:01:03 Using local config sources: 
08/26/15 10:01:03    /etc/condor/condor_config.local
08/26/15 10:01:03 config Macros = 60, Sorted = 60, StringBytes = 1596, TablesBytes = 2208
08/26/15 10:01:03 CLASSAD_CACHING is ENABLED
08/26/15 10:01:03 Daemon Log is logging: D_ALWAYS D_ERROR
08/26/15 10:01:03 DaemonCore: command socket at <192.168.1.151:47358>
08/26/15 10:01:03 DaemonCore: private command socket at <192.168.1.151:47358>
08/26/15 10:01:09 VM-gahp server reported an internal error
08/26/15 10:01:09 VM universe will be tested to check if it is available
08/26/15 10:01:09 History file rotation is enabled.
08/26/15 10:01:09   Maximum history file size is: 20971520 bytes
08/26/15 10:01:09   Number of rotated history files is: 2
08/26/15 10:01:09 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto
slot type 0: Cpus: 1.000000, Memory: 1976, Swap: 25.00%, Disk: 25.00%
slot type 0: Cpus: 1.000000, Memory: 1976, Swap: 25.00%, Disk: 25.00%
slot type 0: Cpus: 1.000000, Memory: 1976, Swap: 25.00%, Disk: 25.00%
slot type 0: Cpus: 1.000000, Memory: 1976, Swap: 25.00%, Disk: 25.00%
08/26/15 10:01:09 slot1: New machine resource allocated
08/26/15 10:01:09 Setting up slot pairings
08/26/15 10:01:09 slot2: New machine resource allocated
08/26/15 10:01:09 Setting up slot pairings
08/26/15 10:01:09 slot3: New machine resource allocated
08/26/15 10:01:09 Setting up slot pairings
08/26/15 10:01:09 slot4: New machine resource allocated
08/26/15 10:01:09 Setting up slot pairings
08/26/15 10:01:09 CronJobList: Adding job 'mips'
08/26/15 10:01:09 CronJobList: Adding job 'kflops'
08/26/15 10:01:09 CronJob: Initializing job 'mips' (/usr/lib/condor/libexec/condor_mips)
08/26/15 10:01:09 CronJob: Initializing job 'kflops' (/usr/lib/condor/libexec/condor_kflops)
08/26/15 10:01:09 slot1: State change: IS_OWNER is false
08/26/15 10:01:09 slot1: Changing state: Owner -> Unclaimed
08/26/15 10:01:09 State change: RunBenchmarks is TRUE
08/26/15 10:01:09 slot1: Changing activity: Idle -> Benchmarking
08/26/15 10:01:09 BenchMgr:StartBenchmarks()
08/26/15 10:01:09 slot2: State change: IS_OWNER is false
08/26/15 10:01:09 slot2: Changing state: Owner -> Unclaimed
08/26/15 10:01:09 State change: RunBenchmarks is TRUE
08/26/15 10:01:09 slot2: Changing activity: Idle -> Benchmarking
08/26/15 10:01:09 slot2: Changing activity: Benchmarking -> Idle
08/26/15 10:01:09 slot3: State change: IS_OWNER is false
08/26/15 10:01:09 slot3: Changing state: Owner -> Unclaimed
08/26/15 10:01:09 State change: RunBenchmarks is TRUE
08/26/15 10:01:09 slot3: Changing activity: Idle -> Benchmarking
08/26/15 10:01:09 slot3: Changing activity: Benchmarking -> Idle
08/26/15 10:01:09 slot4: State change: IS_OWNER is false
08/26/15 10:01:09 slot4: Changing state: Owner -> Unclaimed
08/26/15 10:01:09 State change: RunBenchmarks is TRUE
08/26/15 10:01:09 slot4: Changing activity: Idle -> Benchmarking
08/26/15 10:01:09 slot4: Changing activity: Benchmarking -> Idle
08/26/15 10:01:35 State change: benchmarks completed
08/26/15 10:01:35 slot1: Changing activity: Benchmarking -> Idle

如果需要更多信息,请与我们联系。

更新:

我在谈判者日志中找到了这个。我无法弄清楚它的含义。

08/26/15 11:20:15 ---------- Started Negotiation Cycle ----------
08/26/15 11:20:15 Phase 1:  Obtaining ads from collector ...
08/26/15 11:20:15   Getting startd private ads ...
08/26/15 11:20:15 condor_read() failed: recv(fd=8) returned -1, errno = 104 Connection reset by peer, reading 5 bytes from collector at <127.0.1.1:9618>.
08/26/15 11:20:15 IO: Failed to read packet header
08/26/15 11:20:15 Couldn't fetch ads: communication error
08/26/15 11:20:15 Aborting negotiation cycle

0 个答案:

没有答案