Question

我想在导入之前删除大约5000个文本文件的第一行。

我仍然是PowerShell的新手，因此不确定要搜索什么或如何处理此问题。我目前使用伪代码的概念：

set-content file (get-content unless line contains amount)

但是，我似乎无法弄清楚如何做像contains这样的事情。

Answer 1

虽然我真的很钦佩@hoge的答案，他们都提供了一个非常简洁的技术和一个包装函数来概括它，我鼓励对它进行投票，我不得不评论另外两个使用临时文件的答案（它啃着我喜欢黑板上的指甲！）。

假设文件不是很大，你可以强制管道在离散的部分中操作 - 从而避免了对临时文件的需要 - 明智地使用括号：

(Get-Content $file | Select-Object -Skip 1) | Set-Content $file

......或简称：

(gc $file | select -Skip 1) | sc $file

Answer 2

这不是世界上效率最高的，但这应该有效：

get-content $file |
    select -Skip 1 |
    set-content "$file-temp"
move "$file-temp" $file -Force

Answer 3

使用变量表示法，您可以在没有临时文件的情况下执行此操作：

${C:\file.txt} = ${C:\file.txt} | select -skip 1

function Remove-Topline ( [string[]]$path, [int]$skip=1 ) {
  if ( -not (Test-Path $path -PathType Leaf) ) {
    throw "invalid filename"
  }

  ls $path |
    % { iex "`${$($_.fullname)} = `${$($_.fullname)} | select -skip $skip" }
}

Answer 4

我只需执行相同的任务，gc | select ... | sc在读取1.6 GB文件时占用了我机器上4 GB的RAM。在读完整个文件后至少20分钟没有完成（正如Process Explorer中的Read Bytes报道的那样），此时我不得不将其杀死。

我的解决方案是使用更多.NET方法：StreamReader + StreamWriter。请参阅此答案以获得有关perf： In Powershell, what's the most efficient way to split a large text file by record type?

的精彩答案

以下是我的解决方案。是的，它使用一个临时文件，但在我的情况下，它没关系（这是一个巨大的SQL表创建和插入语句文件）：

PS> (measure-command{
    $i = 0
    $ins = New-Object System.IO.StreamReader "in/file/pa.th"
    $outs = New-Object System.IO.StreamWriter "out/file/pa.th"
    while( !$ins.EndOfStream ) {
        $line = $ins.ReadLine();
        if( $i -ne 0 ) {
            $outs.WriteLine($line);
        }
        $i = $i+1;
    }
    $outs.Close();
    $ins.Close();
}).TotalSeconds

它返回了：

188.1224443

Answer 5

受到AASoft's answer的启发，我出去改进了一点：

在每个循环中循环变量$i和比较与0
将执行包装到try..finally块中以始终关闭正在使用的文件
使解决方案适用于任意行数以从文件开头删除
使用变量$p引用当前目录

这些更改导致以下代码：

$p = (Get-Location).Path

(Measure-Command {
    # Number of lines to skip
    $skip = 1
    $ins = New-Object System.IO.StreamReader ($p + "\test.log")
    $outs = New-Object System.IO.StreamWriter ($p + "\test-1.log")
    try {
        # Skip the first N lines, but allow for fewer than N, as well
        for( $s = 1; $s -le $skip -and !$ins.EndOfStream; $s++ ) {
            $ins.ReadLine()
        }
        while( !$ins.EndOfStream ) {
            $outs.WriteLine( $ins.ReadLine() )
        }
    }
    finally {
        $outs.Close()
        $ins.Close()
    }
}).TotalSeconds

第一次更改将我的60 MB文件的处理时间从5.3s降低到4s。其余的变化更具美感。

Answer 6

我刚从网站上了解到：

Get-ChildItem *.txt | ForEach-Object { (get-Content $_) | Where-Object {(1) -notcontains $_.ReadCount } | Set-Content -path $_ }

或者您可以使用别名来缩短它，例如：

gci *.txt | % { (gc $_) | ? { (1) -notcontains $_.ReadCount } | sc -path $_ }

Answer 7

$x = get-content $file
$x[1..$x.count] | set-content $file

就是这么多。下面是长期无聊的解释。 Get-content返回一个数组。我们可以“索引”数组变量，如this和other脚本专家帖子中所示。

例如，如果我们定义一个这样的数组变量，

$array = @("first item","second item","third item")

所以$ array返回

first item
second item
third item

然后我们可以“索引到”该数组以仅检索其第一个元素

$array[0]

或仅是第二次

$array[1]

或从{2>到最后一个的range索引值。

$array[1..$array.count]

Answer 8

skip`无效，所以我的解决方法是

$LinesCount = $(get-content $file).Count
get-content $file |
    select -Last $($LinesCount-1) | 
    set-content "$file-temp"
move "$file-temp" $file -Force

Answer 9

另一种使用多重分配技术从文件中删除第一行的方法。请参阅Link

 $firstLine, $restOfDocument = Get-Content -Path $filename 
 $modifiedContent = $restOfDocument 
 $modifiedContent | Out-String | Set-Content $filename

Answer 10

对于较小的文件，您可以使用：

＆安培; C：\ windows \ system32 \ more +1 oldfile.csv＆gt; newfile.csv |出空

...但它在处理我的16MB示例文件时不是很有效。它似乎没有终止并释放newfile.csv上的锁。

使用PowerShell删除顶行文本文件

10 个答案: