Question

我想设置一个脚本，以便连续解析xml文件中的特定标记。

该脚本包含以下while循环：

function scan_t()
{
INPUT_FILE=${1}
while : ; do
   if [[ -f "$INPUT_FILE" ]]
   then
      ret=`cat ${INPUT_FILE} | grep "<data>" | awk -F"=|>" '{print $2}' | awk -F"=|<" '{print $1}'`
      if [[ "$ret" -ne 0 ]] && [[ -n "$ret" ]]
      then
         ...
      fi
   fi
done
} 
scant_t "/tmp/test.xml"

行格式为：

<data>0</data> or <data>1</data> <data>2</data> ..

即使已将条件if [[ -f "$INPUT_FILE" ]]添加到脚本中，有时我也会得到：

cat：/tmp/test.xml：没有这样的文件或目录。

实际上，$INPUT_FILE通常由其他进程消耗，该进程在读取后会对文件进行抑制。

此while循环仅用于测试，cat错误并不重要，但我想隐藏此返回值，因为它会对终端造成很大污染。

Answer 1

如果某个其他进程也可以在此脚本看到之前读取和删除该文件，那么您已经设计了具有竞争条件的系统。（我认为“收费压制”意味着“旨在取消联系”......）

如果此脚本可以选择查看每个输入文件，那么只需将stderr重定向到/dev/null（即在竞争条件咬人时忽略错误）。如果它不是可选的，那么让这个脚本将输入文件重命名为其他内容，并让另一个进程监视那个。在重命名之前检查该文件是否存在，以确保不覆盖其他进程尚未读取的文件。

你的循环设计很可怕。首先，你正忙着等待文件出现时正在等待（根本没有sleep）。其次，当输入存在时，你正在运行4个程序，而不是1。

使用inotifywait来查看目录以进行更改，可以避免忙等待。因此if [[ -f $INPUT_FILE ]]循环体仅在修改目录后运行，而不是像CPU核心运行它一样快。

第二个更容易解决：从不cat file | something。 something file或something < file如果something未在其命令行上获取文件名，或行为不同，则为cat或foo=$(<file)。 INPUT_FILE=foo; inotifywait -m -e close_write -e moved_to --format %f . | while IFS= read -r event_file;do [[ $event_file == $INPUT_FILE ]] && awk -F '[<,>]' '/data/ {printf "%s ",$3} END {print ""}' "$INPUT_FILE" 2>/dev/null # echo "$event_file" && # date; done # tested and working with the commented-out echo/date commands仅在您有多个要连接的文件时才有用。要将文件读入shell变量，请使用$INPUT_FILE。

我从评论中看到，您已经设法将整个管道转变为单个命令。所以写

while [[ -e $INPUT2 ]]; do sleep 0.2; done; mv -n "$INPUT_FILE" "$INPUT2"

请注意，我正在等待close_write和moved_to，而不是其他事件，以避免跳枪和读取未完成写入的文件。将inotifywait放在它自己的目录中，这样就不会出现误报事件唤醒你的循环以获取其他文件名。

要实现重命名到输入的下一阶段建议，你需要在awk之后放置一个$INPUT_FILE忙等待循环。

另一种方法是每次循环迭代运行inotifywait一次，但这有可能让您在# Race condition with an asynchronous producer, DON'T USE while inotifywait -qq -e close_write -e moved_to; do [[ $event_file == $INPUT_FILE ]] && awk -F '[<,>]' '/data/ {printf "%s ",$3} END {print ""}' "$INPUT_FILE" 2>/dev/null done开始观看之前创建sleep。因此，生产者将等待消费者消费，消费者将不会看到该事件。

sleep 0.5

似乎没有办法指定尚不存在的文件的名称，即使是作为过滤器，因此循环体需要在使用之前测试dir中存在的特定文件。< / p>

如果您没有inotifywait可用，则可以将open(2)放入循环中。 GNU sleep支持小数秒，如usleep。 Busybox可能没有。您可能还想编写一个微不足道的C程序，它会在包含nanosleep或open的循环中尝试exec该文件。 awk成功后，重定向stdin，stat您的open程序。这样，#include <unistd.h> // for usleep/dup2 #include <sys/types.h> // for open #include <sys/stat.h> #include <fcntl.h> #include <errno.h> #include <stdio.h> // for perror void waitloop(const char *path) { const char *const awk_args[] = { "-F", "[<,>]", "/data/ {printf \"%s \",$3} END {print \"\"}", path }; while(42) { int fd = open(path, O_RDONLY); if (-1 != fd) { // if you fork() here, you can avoid the shell loop too. dup2(fd, 0); // redirect stdin from fd. In theory should check for error here, too. close(fd); // and do this in the parent after fork execv("/usr/bin/awk", (char * const*)awk_args); // execv's prototype doesn't prevent it from modifying the strings? } else if(errno != ENOENT) { perror("opening the file"); } // else ignore ENOENT usleep(10000); // 10 milliseconds. } } // optional TODO: error-check *all* the system calls.和open之间就不会有竞争。

usleep

这个编译，但我还没有测试过。在sleep 0.01 / usleep的单个进程内部循环比从shell运行整个进程{{1}}轻得多。

更好的方法是使用inotify来监视目录事件以检测出现的文件，而不是{{1}}。为了避免竞争，在设置inotify监视之后，如果在上次检查之后但在inotify监视变为活动状态之前创建了该文件，请另外检查该文件是否存在。

隐藏猫提示错误

1 个答案: