Linux将多个文本文件解析为单独的文件

时间:2014-05-25 15:31:13

标签: regex linux bash parsing split

我有100个文本文件,每个文件大小约1gb,都在同一个文件夹中 每个文件包含许多信息,这些信息由转储的sql数据库文件中的块分隔

每个块以字符串“Create Table`”开头 每个块以字符串“表结构为表”

结尾

由于我在Linux上有笔记经验,我想问你如何在linux下编写脚本,从文件夹中的所有文件循环,为每个文件按块分割这个文件,并将这些块保存到单独的txt文件中,命名为每个块起始字符串,并将所有每个文件块放在名为filename的不同文件夹中。

For example, logic must be such as

For each file in folder
  Create xFolder Named as file

  For each match by regex((Create Table `).*(Table structure for table))
    Create file in xFolder named as regex((Create Table `).*`) (Extract name between ` `` from this match)
    Put matched text to file
  Next match
Next file

因此,如果文件MyFirstFile.txt包含3个块:

Create Table `Table1` (
text
text
...
text )
- Table structure for table

Create Table `Table2` (
text
text
...
text )
- Table structure for table

Create Table `Table3` (
text
text
...
text )
- Table structure for table

然后必须创建名为MyFirstFile的文件夹 在这个文件夹中有3个txt文件Table1.txt,Table2.txt,Table3.txt

Table1.txt must have inside

    Create Table `Table1` (
    text
    text
    ...
    text )
    - Table structure for table

Table2.txt must have inside

    Create Table `Table2` (
    text
    text
    ...
    text )
    - Table structure for table

Table3.txt must have inside

    Create Table `Table3` (
    text
    text
    ...
    text )
    - Table structure for table

1 个答案:

答案 0 :(得分:0)

以一个文件(file1.txt)为例:

for table_name in `grep "Create Table" file1.txt | awk '{print $3}'`
do
   awk '/Create Table '"$table_name"'/{print;while(getline)if($0 !~/Create Table/)print;else exit}` file1.txt > $table_name.txt
done

对于所有文件只需一个循环。自己尝试一下。

祝你好运!