Question

我想知道是否可以将文件分成相等的部分（编辑： =除了最后一个之外的所有相等），而不会破坏该行？在Unix中使用split命令，行可能会被分成两半。有没有办法，比如说，将文件分成5个相等的部分，但它仍然只包含整行（如果其中一个文件稍微大一点，那就没问题了）？我知道我可以计算行数，但我必须在bash脚本中为很多文件执行此操作。非常感谢！

Answer 1

如果您指的是相同数量的行， split可以选择：

split --lines=75

如果你需要知道75对于N等份应该是什么，那么：

lines_per_part = int(total_lines + N - 1) / N

可以使用wc -l获取总行数。

请参阅以下脚本以获取示例：

#!/usr/bin/bash

# Configuration stuff

fspec=qq.c
num_files=6

# Work out lines per file.

total_lines=$(wc -l <${fspec})
((lines_per_file = (total_lines + num_files - 1) / num_files))

# Split the actual file, maintaining lines.

split --lines=${lines_per_file} ${fspec} xyzzy.

# Debug information

echo "Total lines     = ${total_lines}"
echo "Lines  per file = ${lines_per_file}"    
wc -l xyzzy.*

输出：

Total lines     = 70
Lines  per file = 12
  12 xyzzy.aa
  12 xyzzy.ab
  12 xyzzy.ac
  12 xyzzy.ad
  12 xyzzy.ae
  10 xyzzy.af
  70 total

split的更新版本允许您使用CHUNKS选项指定多个-n/--number。因此，您可以使用以下内容：

split --number=l/6 ${fspec} xyzzy.

（ell-slash-six，意为lines，而不是one-slash-six。

这将为您提供大小相同的文件，没有中线分割。

我提到了最后一点，因为它不会在每个文件中提供大致相同数量的行，更多相同数量的字符。

因此，如果您有一个20个字符的行和19个1个字符的行（总共20行）并拆分为五个文件，那么您很可能将在每个文件中获得四行

Answer 2

甚至不需要脚本，split(1)支持开箱即用的所需功能：
split -l 75 auth.log auth.log. 上面的命令将文件分成75行的块，并在表单上输出文件：auth.log.aa, auth.log.ab, ...

原始文件和输出上的

wc -l给出：

  321 auth.log
   75 auth.log.aa
   75 auth.log.ab
   75 auth.log.ac
   75 auth.log.ad
   21 auth.log.ae
  642 total

Answer 3

split在coreutils版本8.8（2010年12月22日公布）中使用--number选项更新，以生成特定数量的文件。选项--number = l / n生成n个文件而不分割行。

http://www.gnu.org/software/coreutils/manual/html_node/split-invocation.html#split-invocation http://savannah.gnu.org/forum/forum.php?forum_id=6662

Answer 4

一个简单问题的简单解决方案：

split -n l/5 your_file.txt

这里不需要编写脚本。

来自man文件，CHUNKS may be:

l/N     split into N files without splitting lines

<强>更新

并非所有unix dist都包含此标志。例如，它在OSX中不起作用。要使用它，您可以考虑replacing the Mac OS X utilities with GNU core utilities。

Answer 5

我做了一个bash脚本，给出了很多部分作为输入，拆分文件

#!/bin/sh

parts_total="$2";
input="$1";

parts=$((parts_total))
for i in $(seq 0 $((parts_total-2))); do
  lines=$(wc -l "$input" | cut -f 1 -d" ")
  #n is rounded, 1.3 to 2, 1.6 to 2, 1 to 1
  n=$(awk  -v lines=$lines -v parts=$parts 'BEGIN { 
    n = lines/parts;
    rounded = sprintf("%.0f", n);
    if(n>rounded){
      print rounded + 1;
    }else{
      print rounded;
    }
  }');
  head -$n "$input" > split${i}
  tail -$((lines-n)) "$input" > .tmp${i}
  input=".tmp${i}"
  parts=$((parts-1));
done
mv .tmp$((parts_total-2)) split$((parts_total-1))
rm .tmp*

我使用head和tail命令，并存储在tmp文件中，用于拆分文件

#10 means 10 parts
sh mysplitXparts.sh input_file 10

或与awk，其中0.1是10％=＆gt; 10份，或0.334份，3份

awk -v size=$(wc -l < input) -v perc=0.1 '{
  nfile = int(NR/(size*perc)); 
  if(nfile >= 1/perc){
    nfile--;
  } 
  print > "split_"nfile
}' input

Answer 6

var dict = File.ReadLines("test.txt")
               .Where(line => !string.IsNullOrWhitespace(line))
               .Select(line => line.Split(new char[] { '=' }, 2, 0))
               .ToDictionary(parts => parts[0], parts => parts[1]);


or 

    enter code here

line="to=xxx@gmail.com=yyy@yahoo.co.in";
string[] tokens = line.Split(new char[] { '=' }, 2, 0);

ans:
tokens[0]=to
token[1]=xxx@gmail.com=yyy@yahoo.co.in"

如何将文件拆分成相等的部分，而不会破坏单独的行？

6 个答案: