Question

我有以下情况：

在Linux计算机上安装了一个Windows文件夹。可能有多个文件夹（手工设置）在这个窗口安装。我必须做一些事情（最好是一个开头的脚本）来观看这些文件夹。

以下是这些步骤：监视任何传入的文件。确保它们完全转移。将其移动到另一个文件夹。我对Windows机器上的文件传输程序没有任何控制权。我相信这是一个安全的FTP。所以我不能要求该过程向我发送预告文件以确保文件传输完成。

我写了一个bash脚本。我想知道这种方法可能存在的任何陷阱。原因是，有可能为这样的多个目录运行此脚本的多个副本。

目前，可能需要监控多达100个目录。

以下是该脚本。我很抱歉在这里贴了很长一段时间。请花点时间仔细阅读一下评论/批评它。： - ）

它需要3个参数，必须要监视的文件夹，文件必须移动到的文件夹，和时间间隔，已在下面说明。

对不起，对齐似乎有问题。 Markdown似乎不喜欢它。我试图正确组织它，但不能这样做。

Linux servername 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:27:17 EDT 2006 i686 i686 i386 GNU/Linux

#!/bin/bash
log_this()
{
    message="$1"
    now=`date "+%D-%T"`
    echo $$": "$now ": " $message
}
usage()
{
    cat << EOF
Usage: $0 <Directory to be watched> <Directory to transfer> <time interval>
Time interval is the amount of time after which the modification time of a
file will be monitored. 
EOF
    `exit 1`
}

if [ $# -lt 2 ]
then
    usage
fi

WATCH_DIR=$1
APP_DIR=$2

if [ ! -d "$WATCH_DIR" ]
then
    log_this "FATAL: WATCH_DIR, $WATCH_DIR does not exist. Exiting"
    exit 1
fi

if [ ! -d "$APP_DIR" ]
then
    log_this "APP_DIR: $APP_DIR does not exist. Exiting"
    exit 1
fi


# This needs to be set after considering the rate of file transfer.
# Represents the seconds elapsed after the last modification to the file.
# If not supplied as parameter, defaults to 3.

seconds_between_mods=$3

if ! [[ "$seconds_between_mods" =~ ^[0-9]+$ ]]; then
        if [ ${#seconds_between_mods} -eq 0 ]; then
                log_this "No value supplied for elapse time. Defaulting to 3."
                seconds_between_mods=3
        else
                log_this "Invalid value provided for elapse time"
                exit 1
        fi
fi

log_this "Start Monitor."

while true
do
        ls -1 $WATCH_DIR | while read file_name
        do
            log_this "Start Monitoring for $file_name"

            # Refer only the modification with reference to the mount folder.
            # If there is a diff in time between servers, we are in trouble.

            token_file=$WATCH_DIR/foo.$$
            current_time=`touch $token_file && stat -c "%Y" $token_file`
            rm -f $token_file 2>/dev/null

            log_this "Current Time: $current_time"
            last_mod_time=`stat -c "%Y" $WATCH_DIR/$file_name`

            elapsed_time=`expr $current_time - $last_mod_time`
            log_this "Elapsed time ==> $elapsed_time"

            if [ $elapsed_time -ge $seconds_between_mods ]
            then
                    log_this "Moving $file_name to $APP_DIR"

                    # In case if there is no space left on the target mount, hide the     file
                    # in the mount itself and remove the incomplete file from APP_DIR.
                    mv $WATCH_DIR/$file_name $APP_DIR
                    if [ $? -ne 0 ]
                    then
                            log_this "FATAL: mv failed!! Hiding $file_name"
                            rm $APP_DIR/$file_name
                            mv $WATCH_DIR/$file_name $WATCH_DIR/.$file_name
                            log_this "Removed $APP_DIR/$file_name. Look for $WATCH_DIR/.$file_name and submit later."
                    fi

                    log_this "End Monitoring for $file_name"
            else
                    log_this "$file_name: Transfer seems to be in progress"
            fi
    done
    log_this "Nothing more to monitor."
    echo
    sleep 5
done

Answer 1

这在任何时间都不会起作用。在生产中，您将遇到网络问题和其他错误，这些错误可能会在上载目录中留下部分文件。我也不喜欢“预告片”文件的想法。通常的方法是以临时名称上传文件，然后在上载完成后重命名。

这样，您只需要列出目录，过滤临时名称，如果还有什么，请使用它。

如果您无法进行此更改，请向您的老板寻求书面许可，以实施可能导致任意数据损坏的内容。这有两个目的：1）让他们明白这是一个真正的问题而不是你构成的东西，2）在它破裂时保护自己......因为它会猜测谁会得到所有的责任？

Answer 2

我认为更加理智的方法是使用内核级文件系统通知项。例如inotify。获取工具here。

Answer 3

incron 是一个“inotify cron”系统。它由一个守护进程和一个表操纵器组成。您可以使用与常规cron类似的方式。不同之处在于inotify cron处理文件系统事件而不是时间段。

Answer 4

首先确保已安装 inotify-tools 。

然后像这样使用它们：

logOfChanges="/tmp/changes.log.csv" # Set your file name here.

# Lock and load
inotifywait -mrcq $DIR > "$logOfChanges" & # monitor, recursively, output CSV, be quiet.
IN_PID=$$

# Do your stuff here
...

# Kill and analyze
kill $IN_PID
cat "$logOfChanges" | while read entry; do
   # Split your CSV, but beware that file names may contain spaces too.
   # Just look up how to parse CSV with bash. :)
   path=... 
   event=...
   ...  # Other stuff like time stamps
   # Depending on the event…
   case "$event" in
     SOME_EVENT) myHandlingCode path ;;
     ...
     *) myDefaultHandlingCode path ;;
done

或者，在--format上使用-c代替inotifywait会是一个想法。

仅限man inotifywait和man inotifywatch了解更多信息。

Answer 5

老实说，设置为在启动时运行的python应用程序可以快速有效地执行此操作。 Python具有惊人的操作系统支持，而且相当完整。

运行脚本可能会有效，但要小心和管理会很麻烦。我认为你会把这些作为频繁的cron工作来运行吗？

Answer 6

为了让你站起来，这是我写的一个小应用程序，它采用路径并查看jpeg文件的二进制输出。我从来没有完成它，但它会让你开始，看到python的结构以及一些使用os ..

我不会花太多时间担心我的代码。

import time, os, sys

#analyze() takes in a path and moves into the output_files folder, to then analyze files

def analyze(path):
    list_outputfiles = os.listdir(path + "/output_files")
    print list_outputfiles
    for i in range(len(list_outputfiles)):
        #print list_outputfiles[i]
        f = open(list_outputfiles[i], 'r')
        f.readlines()

#txtmaker reads the media file and writes its binary contents to a text file.

def txtmaker(c_file): 
    print c_file
    os.system("cat" + " " + c_file + ">" + " " + c_file +".txt")
    os.system("mv *.txt output_files")

#parser() takes in the inputed path, reads and lists all files, creates a directory, then calls txtmaker.

def parser(path):
    os.chdir(path)
    os.mkdir(path + "/output_files", 0777)
    list_files = os.listdir(path)
    for i in range(len(list_files)):
        if os.path.isdir(list_files[i]) == True:
            print (list_files[i], "is a directory")
        else:
            txtmaker(list_files[i])
    analyze(path)

def main():
    path = raw_input("Enter the full path to the media: ")
    parser(path)


if __name__ == '__main__':

    main()

bash脚本来查看文件夹

6 个答案: