我有一个150GB的大文件CSV文件,我想删除前17行和后8行。我尝试了以下但似乎没有正常工作
sed -i -n -e :a -e '1,8!{P;N;D;};N;ba'
和
sed -i '1,17d'
我想知道是否有人可以帮助sed或awk,一个班轮会很棒吗?
答案 0 :(得分:17)
head
和tail
比sed
或awk
更适合工作。
tail -n+18 file | head -n-8 > newfile
答案 1 :(得分:9)
awk -v nr="$(wc -l < file)" 'NR>17 && NR<(nr-8)' file
答案 2 :(得分:2)
所有awk:
awk 'NR>y+x{print A[NR%y]} {A[NR%y]=$0}' x=17 y=8 file
答案 3 :(得分:1)
Try this :
sed '{[/]<n>|<string>|<regex>[/]}d' <fileName>
sed '{[/]<adr1>[,<adr2>][/]d' <fileName>
其中
/.../=定界符
n =行号
string = string in line
正则表达式=与搜索模式对应的正则表达式
addr =一条线的地址(数字或图案)
d =删除
请参阅此link
答案 4 :(得分:0)
LENGTH=`wc -l < file`
head -n $((LENGTH-8)) file | tail -n $((LENGTH-17)) > file
编辑:由于mtk在评论中发布,这将无效。如果您想使用wc
并跟踪文件长度,您应该使用:
LENGTH=`wc -l < file`
head -n $((LENGTH-8)) file | tail -n $((LENGTH-8-17)) > file
或:
LENGTH=`wc -l < file`
head -n $((LENGTH-8)) file > file
LENGTH=`wc -l < file`
tail -n $((LENGTH-17)) file > file
是什么让这个解决方案不如choroba发布的优雅:)
答案 5 :(得分:0)
我今天为shell学到了这一点。
{
ghead -17 > /dev/null
sed -n -e :a -e '1,8!{P;N;D;};N;ba'
} < my-bigfile > subset-of
必须使用非消耗head
,因此使用GNU coreutils中的ghead
。
答案 6 :(得分:0)
Similar to Thor's answer, but a bit shorter:
sed -i '' -e $'1,17d;:a\nN;19,25ba\nP;D' file.txt
The -i ''
tells sed to edit the file in place. (The syntax may be a bit different on your system. Check the man page.)
If you want to delete front
lines from the front and tail
from the end, you'd have to use the following numbers:
1,{front}d;:a\nN;{front+2},{front+tail}ba\nP;D
(I put them in curly braces here, but that's just pseudocode. You'll have to replace them by the actual numbers. Also, it should work with {front+1}
, but it doesn't on my machine (macOS 10.12.4). I think that's a bug.)
I'll try to explain how the command works. Here's a human-readable version:
1,17d # delete lines 1 ... 17, goto start
:a # define label a
N # add next line from file to buffer, quit if at end of file
19,25ba # if line number is 19 ... 25, goto start (label a)
P # print first line in buffer
D # delete first line from buffer, go back to start
First we skip 17 lines. That's easy. The rest is tricky, but basically we keep a buffer of eight lines. We only start printing lines when the buffer is full, but we stop printing when we reach the end of the file, so at the end, there are still eight lines left in the buffer that we didn't print - in other words, we deleted them.