为什么我们从lsof中删除文件

时间:2019-01-13 17:48:00

标签: hadoop yarn pid ps lsof

我们有带有数据节点机器的hadoop集群

我们注意到DATANODE计算机上的平均CPU负载很高

 uptime
 17:27:46 up 263 days,  3:39,  3 users,  load average: 7.94, 6.66, 7.38

简短验证后,我们注意到有很多删除文件(来自lsof)

示例

[root@DATANODE02 ~]# lsof +L1
COMMAND      PID  USER   FD   TYPE DEVICE SIZE/OFF NLINK      NODE NAME
avahi-dae   1938 avahi    5r   REG  253,2 10406312     0 402658715 /var/lib/sss/mc/initgroups (deleted)
avahi-dae   1949 avahi    5r   REG  253,2 10406312     0 402658715 /var/lib/sss/mc/initgroups (deleted)
sssd        1990  root   17r   REG  253,2 10406312     0 402658715 /var/lib/sss/mc/initgroups (deleted)
sssd_be     1996  root   20r   REG  253,2 10406312     0 402658715 /var/lib/sss/mc/initgroups (deleted)
cupsd       2269  root   10r   REG  253,0     3024     0 139474724 /etc/passwd+ (deleted)
smcd       12588  root   15u   REG  253,0    41590     0  13826415 /tmp/tmpfHHZRQO (deleted)
bluetooth 138025  root    9r  FIFO  253,0      0t0     0    844091 /tmp/hogsuspend (deleted)
gnome-she 138037  root   20r   REG  253,0       56     0  68959031 /root/.local/share/gvfs-metadata/home.55Q9UZ (deleted)
gnome-she 138037  root   24r   REG  253,0    32768     0  70246314 /root/.local/share/gvfs-metadata/home-a9398246.log (deleted)
java      193699  yarn 1082r   REG   8,16   293715     0  93588652 /grid/sdb/hadoop/hdfs/data/current/BP-428352611-43.21.3.46-1502127526112/current/finalized/subdir15/subdir35/blk_1186014185 (deleted)
java      193699  yarn 1191r   REG   8,80   292993     0  88474445 /grid/sdf/hadoop/hdfs/data/current/BP-428352611-43.21.3.46-1502127526112/current/finalized/subdir15/subdir35/blk_1186014091 (deleted)
java      193699  yarn 1205r   REG   8,16     2303     0  93588671 /grid/sdb/hadoop/hdfs/data/current/BP-428352611-43.21.3.46-1502127526112/current/finalized/subdir15/subdir35/blk_1186014185_112276263.meta (deleted)
java      193699  yarn 1265r   REG   8,32    23931     0  25962378 /grid/sdc/hadoop/hdfs/data/current/BP-428352611-43.21.3.46-1502127526112/current/finalized/subdir15/subdir36/blk_1186014275 (deleted)
java      193699  yarn 1273r   REG   8,32      195     0  25962397 /grid/sdc/hadoop/hdfs/data/current/BP-428352611-43.21.3.46-1502127526112/current/finalized/subdir15/subdir36/blk_1186014275_112276353.meta (deleted)
java      193699  yarn 1307r   REG   8,48    66713     0  61461179 /grid/sdd/hadoop/hdfs/data/current/BP-428352611-43.21.3.46-1502127526112/current/finalized/subdir15/subdir36/blk_1186014410 (deleted)
java      193699  yarn 1385r   REG   8,48      531     0  61461193 /grid/sdd/hadoop/hdfs/data/current/BP-428352611-43.21.3.46-1502127526112/current/finalized/subdir15/subdir36/blk_1186014410_112276488.meta (deleted)
java      193699  yarn 1477r   REG   8,80     2299     0  88474446 /grid/sdf/hadoop/hdfs/data/current/BP-428352611-43.21.3.46-1502127526112/current/finalized/subdir15/subdir35/blk_1186014091_112276169.meta (deleted)
java      193699  yarn 1754r   REG   8,16    91051     0  93696129 /grid/sdb/hadoop/hdfs/data/current/BP-428352611-43.21.3.46-1502127526112/current/finalized/subdir15/subdir37/blk_1186014689 (deleted)
java      193699  yarn 1760r   REG   8,16      719     0  93696130 /grid/sdb/hadoop/hdfs/data/current/BP-428352611-43.21.3.46-1502127526112/current/finalized/subdir15/subdir37/blk_1186014689_112276769.meta (deleted)
java      193699  yarn 1972r   REG   8,48    37960     0  61447490 /grid/sdd/hadoop/hdfs/data/current/BP-428352611-43.21.3.46-1502127526112/current/finalized/subdir15/subdir39/blk_1186015148 (deleted)
java      193699  yarn 1976r   REG   8,48      307     0  61447491 /grid/sdd/hadoop/hdfs/data/current/BP-428352611-43.21.3.46-1502127526112/current/finalized/subdir15/subdir39/blk_1186015148_112277228.meta (deleted)

仅打印已删除文件的PID:

lsof +L1 | awk '{print $2}' | sort | uniq
12588
138025
138037
138151
138185
1938
1949
1990
1996
2269

因为上面的所有文件都不存在

/grid/sdd/hadoop/hdfs/data/current/BP-428352611-43.21.3.46-1502127526112/current/finalized/subdir15/subdir39/blk_1186015148_112277228.meta

我们杀死了所有的PID

kill 12588
kill 138025

以此类推

在我们杀死所有PID之后,CPU负载平均随以下情况而降低

 uptime
 17:27:46 up 263 days,  3:39,  3 users,  load average: 2.24, 4.61, 5.75

我的问题是

是什么原因导致尽管删除了文件,但pId仍然保持打开状态?

是否可以用

杀死PID
 kill PID

0 个答案:

没有答案