我正在使用:R版本3.0.1(2013-05-16)和降雪1.84-4在m2.2xl AWS EC2上初始化(使用雪0.3-13),原始AMI来自http://www.louisaslett.com/RStudio_AMI/ 。
我的问题是在使用以下方法创建群集后
sfInit(parallel=TRUE,cpus=4, type="SOCK",socketHosts=rep("localhost",4)
sfExport('dataframe')
answer=sfSapply(dataframe, some_function)
sfStop()
从命令行我运行:
sudo R CMD BATCH xyz.R &
如果xyz.R失败,所有节点都会继续存在,但现在我无法使用sfStop(),因为我将该文件作为脚本运行。如果我在Rstudio浏览器窗口中运行相同的代码,如果代码失败,我可以成功运行sfStop()。
如果我包括
tryCatch(
{sfInit(parallel=TRUE,cpus=4, type="SOCK",socketHosts=rep("localhost",4)
sfExport('dataframe')
answer=sfSapply(dataframe, some_function)
},error=function(e){
print(conditionMessage(e))
sfStop()
}
)
然后它捕获任何错误并杀死群集。此外,如果我只运行Rstudio中的命令,我可以停止群集。但问题仍然存在,我有30多个节点是使用脚本启动的,无法停止。
我曾试图使用sudo kill 'PID' -9
来杀死节点,但它们总是重新出现。我也试过杀死所有的PPID = 2。我尝试重新启动我的EC2,但这也没办法。我甚至竟然手动杀死正在运行的每个进程(是的,所有100多个进程),但这些PPID = 2都会回来。这是ps -ef
的输出。底部显示我当前正在运行的8个集群。
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 15:47 ? 00:00:02 /sbin/init
root 2 0 0 15:47 ? 00:00:00 [kthreadd]
root 3 2 0 15:47 ? 00:00:00 [ksoftirqd/0]
root 4 2 0 15:47 ? 00:00:00 [kworker/0:0]
root 5 2 0 15:47 ? 00:00:00 [kworker/0:0H]
root 6 2 0 15:47 ? 00:00:00 [kworker/u:0]
root 7 2 0 15:47 ? 00:00:00 [kworker/u:0H]
root 8 2 0 15:47 ? 00:00:00 [migration/0]
root 9 2 0 15:47 ? 00:00:00 [rcu_bh]
root 10 2 0 15:47 ? 00:00:00 [rcu_sched]
root 11 2 0 15:47 ? 00:00:00 [watchdog/0]
root 12 2 0 15:47 ? 00:00:00 [watchdog/1]
root 13 2 0 15:47 ? 00:00:00 [ksoftirqd/1]
root 14 2 0 15:47 ? 00:00:00 [migration/1]
root 15 2 0 15:47 ? 00:00:00 [kworker/1:0]
root 16 2 0 15:47 ? 00:00:00 [kworker/1:0H]
root 17 2 0 15:47 ? 00:00:00 [watchdog/2]
root 18 2 0 15:47 ? 00:00:00 [ksoftirqd/2]
root 19 2 0 15:47 ? 00:00:00 [migration/2]
root 20 2 0 15:47 ? 00:00:00 [kworker/2:0]
root 21 2 0 15:47 ? 00:00:00 [kworker/2:0H]
root 22 2 0 15:47 ? 00:00:00 [watchdog/3]
root 23 2 0 15:47 ? 00:00:00 [ksoftirqd/3]
root 24 2 0 15:47 ? 00:00:00 [migration/3]
root 25 2 0 15:47 ? 00:00:00 [kworker/3:0]
root 26 2 0 15:47 ? 00:00:00 [kworker/3:0H]
root 27 2 0 15:47 ? 00:00:00 [cpuset]
root 28 2 0 15:47 ? 00:00:00 [khelper]
root 29 2 0 15:47 ? 00:00:00 [kdevtmpfs]
root 30 2 0 15:47 ? 00:00:00 [netns]
root 31 2 0 15:47 ? 00:00:00 [xenwatch]
root 32 2 0 15:47 ? 00:00:00 [xenbus]
root 33 2 0 15:47 ? 00:00:00 [bdi-default]
root 34 2 0 15:47 ? 00:00:00 [kintegrityd]
root 35 2 0 15:47 ? 00:00:00 [kblockd]
root 36 2 0 15:47 ? 00:00:00 [kworker/3:1]
root 37 2 0 15:47 ? 00:00:00 [ata_sff]
root 38 2 0 15:47 ? 00:00:00 [khubd]
root 39 2 0 15:47 ? 00:00:00 [md]
root 40 2 0 15:47 ? 00:00:00 [devfreq_wq]
root 41 2 0 15:47 ? 00:00:00 [kworker/1:1]
root 43 2 0 15:47 ? 00:00:00 [khungtaskd]
root 44 2 0 15:47 ? 00:00:00 [kswapd0]
root 45 2 0 15:47 ? 00:00:00 [ksmd]
root 46 2 0 15:47 ? 00:00:00 [fsnotify_mark]
root 47 2 0 15:47 ? 00:00:00 [ecryptfs-kthrea]
root 48 2 0 15:47 ? 00:00:00 [crypto]
root 59 2 0 15:47 ? 00:00:00 [kthrotld]
root 60 2 0 15:47 ? 00:00:00 [kworker/u:1]
root 61 2 0 15:47 ? 00:00:00 [khvcd]
root 62 2 0 15:47 ? 00:00:00 [kworker/2:1]
root 63 2 0 15:47 ? 00:00:00 [kworker/0:1]
root 64 2 0 15:47 ? 00:00:00 [binder]
root 83 2 0 15:47 ? 00:00:00 [deferwq]
root 84 2 0 15:47 ? 00:00:00 [charger_manager]
root 237 2 0 15:47 ? 00:00:00 [jbd2/xvda1-8]
root 238 2 0 15:47 ? 00:00:00 [ext4-dio-unwrit]
root 270 1 0 15:47 ? 00:00:00 mountall --daemon
root 289 1 0 15:47 ? 00:00:00 upstart-file-bridge --daemon
root 372 1 0 15:47 ? 00:00:00 upstart-udev-bridge --daemon
root 374 1 0 15:47 ? 00:00:00 /sbin/udevd --daemon
root 535 1 0 15:47 ? 00:00:00 upstart-socket-bridge --daemon
root 635 1 0 15:47 ? 00:00:00 dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /
root 833 1 0 15:47 ? 00:00:00 /usr/sbin/sshd -D
syslog 888 1 0 15:47 ? 00:00:00 rsyslogd -c5
102 952 1 0 15:47 ? 00:00:00 dbus-daemon --system --fork
root 963 1 0 15:47 ? 00:00:00 /usr/sbin/modem-manager
root 978 1 0 15:47 tty4 00:00:00 /sbin/getty -8 38400 tty4
root 984 1 0 15:47 tty5 00:00:00 /sbin/getty -8 38400 tty5
root 1012 1 0 15:47 tty2 00:00:00 /sbin/getty -8 38400 tty2
root 1017 1 0 15:47 tty3 00:00:00 /sbin/getty -8 38400 tty3
root 1020 1 0 15:47 tty6 00:00:00 /sbin/getty -8 38400 tty6
avahi 1036 1 0 15:47 ? 00:00:00 avahi-daemon: running [ip-10-0-0-92.local]
root 1040 1 0 15:47 ? 00:00:00 acpid -c /etc/acpi/events -s /var/run/acpid.soc
avahi 1042 1036 0 15:47 ? 00:00:00 avahi-daemon: chroot helper
root 1047 1 0 15:47 ? 00:00:00 /usr/sbin/cups-browsed
root 1065 1 0 15:47 ? 00:00:00 cron
daemon 1066 1 0 15:47 ? 00:00:00 atd
root 1339 374 0 15:47 ? 00:00:00 /sbin/udevd --daemon
root 1340 374 0 15:47 ? 00:00:00 /sbin/udevd --daemon
mysql 1342 1 0 15:47 ? 00:00:04 /usr/sbin/mysqld
root 1381 1 0 15:47 ? 00:00:00 /usr/sbin/cupsd -F
root 1391 1 0 15:47 ? 00:00:00 NetworkManager
whoopsie 1405 1 0 15:47 ? 00:00:00 whoopsie
999 1406 1 0 15:47 ? 00:00:00 /usr/lib/rstudio-server/bin/rserver
root 1414 1 0 15:47 ? 00:00:00 /usr/lib/policykit-1/polkitd --no-debug
root 1427 1 0 15:47 ? 00:00:00 sendmail: MTA: accepting connections
root 1561 1 0 15:47 tty1 00:00:00 /sbin/getty -8 38400 tty1
root 1758 833 0 15:51 ? 00:00:00 sshd: ubuntu [priv]
root 1760 2 0 15:52 ? 00:00:00 [kauditd]
root 1762 1 0 15:52 ? 00:00:00 /usr/sbin/console-kit-daemon --no-daemon
ubuntu 1899 1758 0 15:52 ? 00:00:00 sshd: ubuntu@pts/0
ubuntu 1900 1899 0 15:52 pts/0 00:00:00 -bash
rstudio 1988 1406 3 15:53 ? 00:03:05 /usr/lib/rstudio-server/bin/rsession -u rstudio
rstudio 2146 1 4 16:06 ? 00:03:28 /usr/lib/R/bin/exec/R --slave --no-restore --fi
rstudio 2153 1 19 16:06 ? 00:15:18 /usr/lib/R/bin/exec/R --slave --no-restore --fi
rstudio 2160 1 32 16:06 ? 00:25:38 /usr/lib/R/bin/exec/R --slave --no-restore --fi
rstudio 2167 1 56 16:06 ? 00:44:52 /usr/lib/R/bin/exec/R --slave --no-restore --fi
rstudio 2174 1 63 16:06 ? 00:50:28 /usr/lib/R/bin/exec/R --slave --no-restore --fi
rstudio 2181 1 66 16:06 ? 00:52:09 /usr/lib/R/bin/exec/R --slave --no-restore --fi
rstudio 2188 1 66 16:06 ? 00:52:37 /usr/lib/R/bin/exec/R --slave --no-restore --fi
rstudio 2195 1 64 16:06 ? 00:50:53 /usr/lib/R/bin/exec/R --slave --no-restore --fi
root 2326 2 0 17:00 ? 00:00:00 [flush-202:1]
ubuntu 2371 1900 0 17:25 pts/0 00:00:00 ps -ef
无论我做什么,前50个流程都会存在/将会回来。有没有其他人有这个问题?如果是这样,你是怎么杀死工人的?
答案 0 :(得分:3)
我认为你被术语工人放松了。并行运行R分析时,不是kworker
生成的进程,而是R进程。这也是您在ps -ef
输出中观察到的内容。
The kworker
processes are simply part of the Linux system,与R中的并行处理无关。重启后这些进程重新生成的事实很好地表明了这一点,R工作人员永远不会这样做(除非你添加一个R脚本启动启动配置中的工作人员。)