使用bash将基于正则表达式的文件夹中的所有txt文件拆分为较小的文件

时间:2013-04-22 18:05:23

标签: regex bash text split

我有一个包含大文本文件的文件夹。每个文件都是由[[文件名]]分隔的1000个文件的集合。我想拆分文件并从中制作1000个文件并将它们放在一个新文件夹中。 bash有没有办法做到这一点?任何其他快速方法也可以。

for f in $(find . -name '*.txt')
do mkdir $f
  mv 
  cd $f
  awk '/[[.*]]/{g++} { print $0 > g".txt"}' $f
  cd ..
done 

3 个答案:

答案 0 :(得分:0)

虽然不是由醉酒者写的和写的,但不能保证工作。

import re
import sys


def main():
    pattern = re.compile(r'\[\[(.+)]]')
    with open (sys.argv[1]) as f:
        for line in f:
            m = re.search(pattern, line)
            if m:
                try:
                    with open(fname, 'w+') as g:
                        g.writelines(lines)
                except NameError:
                    pass
                fname = m.group(1)
                lines = []
            else:
                lines.append(line)

    with open(fname, 'w+') as g:
        g.writelines(lines)

if __name__ == '__main__':
    main()

答案 1 :(得分:0)

您正在尝试创建一个与现有文件名称相同的文件夹。

for f in $(find . -name '*.txt')
do mkdir $f

此处,“find”将列出当前路径中的文件,对于每个文件,您将尝试创建具有完全相同名称的目录。一种方法是首先创建一个临时文件夹:

for f in $(find . -name '*.txt')
do mkdir temporary # create a temporary folder
  mv $f temporary # move the file into the folder
  mv temporary $f # rename the temporary folder to the name of the file
  cd $f # enter the folder and go on....
  awk '/[[.*]]/{g++} { print $0 > g".txt"}' $f
  cd ..
done 

请注意,您的所有文件夹都将使用“.txt”扩展名。如果您不想这样,可以在创建文件夹之前将其剪掉;这样,您将不需要临时文件夹,因为您尝试创建的文件夹与.txt文件具有不同的名称。 例如:

for f in $(find . -name '*.txt' | rev | cut -b 5- | rev)

答案 2 :(得分:0)

编写一个bash脚本。在这里,我已经为你完成了。

请注意此脚本的结构和功能:

  • 解释它在usage()函数中的作用,该函数用于-h选项。
  • 提供一组标准选项:-h-n-v
  • 使用getopts进行选项处理
  • 对参数进行大量错误检查
  • 注意文件名解析(注意忽略文件名周围的空格。
  • 隐藏功能中的详细信息。请注意'talk','qtalk','nvtalk'功能?这些来自我建立的bash库,使这种脚本编写变得容易。
  • 解释在$verbose模式下用户会发生什么。
  • 让用户能够在不实际执行操作的情况下查看要执行的操作(-n选项,$norun模式)。
  • 从不直接运行命令。但请使用run功能,该功能会关注$norun$verbose$quiet变量。

我不只是为你钓鱼,而是教你如何捕鱼。

祝你好运下一个bash脚本。

Alan S。

#!/bin/bash
# split-collections IN-FOLDER OUT-FOLDER

PROG="${0##*/}"

usage() {
  cat 1>&2 <<EOF
usage: $PROG [OPTIONS] IN-FOLDER OUT-FOLDER

This script splits a collection of files within IN-FOLDER into
separate, named files into the given OUT-FOLDER.  The created file
names are obtained from formatted text headers within the input
files.

The format of each input file is a set of HEADER and BODY pairs,
where each HEADER is a text line formatted as:

    [[input-filename1]]
    text line 1
    text line 2
    ...
    [[input-filename2]]
    text line 1
    text line 2
    ...

Normal processing will show the filenames being read, and file
names being created.  Use the -v (verbose) option to show the
number of text lines being written to each created file.  Use
-v twice to show the actual lines of text being written.

Use the -n option to show what would be done, without actually
doing it.

Options
 -h       Show this help
 -n       Dry run -- do NOT create any files or make any changes
 -o       Overwrite existing output files.
 -v       Be verbose

EOF
   exit
}

talk()   { echo 1>&2 "$@" ; }
chat()   { [[ -n "$norun$verbose" ]] && talk "$@" ; }
nvtalk() { [[ -n "$verbose" ]] || talk "$@" ; }
qtalk()  { [[ -n "$quiet" ]]   || talk "$@" ; }
nrtalk() { talk "${norun:+(norun) }$@" ; }

error() { 
  local code=2
  case "$1" in [0-9]*) code=$1 ; shift ;; esac
  echo 1>&2 "$@"
  exit $code
}

talkf()   { printf 1>&2 "$@" ; }
chatf()   { [[ -n "$norun$verbose" ]] && talkf "$@" ; }
nvtalkf() { [[ -n "$verbose" ]] || talkf "$@" ; }
qtalkf()  { [[ -n "$quiet" ]]   || talkf "$@" ; }
nrtalkf() { talkf "${norun:+(norun) }$@" ; }

errorf()  { 
  local code=2
  case "$1" in [0-9]*) code=$1 ; shift ;; esac
  printf 1>&2 "$@"
  exit $code
}

# run COMMAND ARGS ...

qrun() {
  ( quiet=1 run "$@" )
}

run() {
  if [[ -n "$norun" ]]; then
    if [[ -z "$quiet" ]]; then
      nrtalk "$@"
    fi
  else
    if [[ -n "$verbose" ]]; then
      talk ">> $@"
    fi
    if ! eval "$@" ; then
      local code=$?
      return $code
    fi
  fi
  return 0
}

show_line() {
  talkf "%s:%d: %s\n" "$in_file" "$lines_in" "$line"
}

# given an input filename, read it and create 
# the output files as indicated by the contents
# of the text in the file

split_collection() {
  in_file="$1"
  out_file=
  lines_in=0
  lines_out=0
  skipping=
  while read line ; do
    : $(( lines_in++ ))

    [[ $verbose_count > 1 ]] && show_line

    # if a line with the format of "[[foo]]" occurs,
    # close the current output file, and open a new
    # output file called "foo"

    if [[ "$line" =~ ^\[\[[[:blank:]]*([^ ]+.*[^ ]|[^ ])[[:blank:]]*\]\][[:blank:]]*$ ]] ; then
      new_file="${BASH_REMATCH[1]}"

      # close out the current file, if any
      if [[ "$out_file" ]]; then
        nrtalkf "%d lines written to %s\n" $lines_out "$out_file"
      fi

      # check the filename for bogosities
      case "$new_file" in 
        *..*|*/*) 
          [[ $verbose_count < 2 ]] && show_line
          error "Badly formatted filename"
          ;;
      esac

      out_file="$out_folder/$new_file"
      if [[ -e "$out_file" ]]; then
        if [[ -n "$overwrite" ]]; then
          nrtalk "Overwriting existing '$out_file'"
          qrun "cat /dev/null >'$out_file'"
        else
          error "$out_file already exists."
        fi
      else
        nrtalk "Creating new output file: '$out_file' ..."
        qrun "touch '$out_file'"
      fi

      lines_out=0
    elif [[ -z "$out_file" ]]; then

      # apparently, there are text lines before the filename
      # header; ignore them (out loud)
      if [[ ! "$skipping" ]]; then
        talk "Text preceding first filename ignored.."
        skipping=1
      fi

    else # next line of input for the file
      qrun "echo \"$line\" >>'$out_file'"
      : $(( lines_out++ ))
    fi
  done
}

norun=
verbose=
verbose_count=0
overwrite=
quiet=

while getopts 'hnoqv' opt ; do
  case "$opt" in
  h)  usage ;;
  n)  norun=1 ;;
  o)  overwrite=1 ;;
  q)  quiet=1 ;;
  v)  verbose=1 ; : $(( verbose_count++ )) ;;
  esac
done
shift $(( OPTIND - 1 ))

in_folder="${1:?Missing IN-FOLDER; see $PROG -h for details}"
out_folder="${2:?Missing OUT-FOLDER; see $PROG -h for details}"

# validate the input and output folders
#
# It might be reasonable to create the output folder for the 
# user, but that's left as an exercise for the user.

in_folder="${in_folder%/}"    # remove trailing slash, if any
out_folder="${out_folder%/}"

[[ -e "$in_folder" ]]  || error "$in_folder does not exist" 
[[ -d "$in_folder" ]]  || error "$in_folder is not a directory."
[[ -e "$out_folder" ]] || error "$out_folder does not exist."
[[ -d "$out_folder" ]] || error "$out_folder is not a directory."

for collection in $in_folder/* ; do
  talk "Reading $collection .."
  split_collection "$collection" <$collection 
done

exit