Question

可能重复：
Why does Mercurial think my SQL files are binary?

我为数据库中的存储过程生成了一整套脚本。当我创建Mercurial存储库并添加这些文件时，它们都被添加为二进制文件。显然，我仍然可以获得版本控制的好处，但是会失去很多效率，“差异化”等文本文件。我确认这些文件确实只是文本。

为什么要这样做？

我可以做些什么来避免它？

有没有办法让Hg改变对这些文件的看法？

以下是变更集日志的片段：

   496.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFindCustomerByMatchCode.StoredProcedure.sql has changed
   497.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFindUnreconcilableChecks.StoredProcedure.sql has changed
   498.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFixBadLabelSelected.StoredProcedure.sql has changed
   499.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFixCCOPL.StoredProcedure.sql has changed
   500.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFixCCOrderMoneyError.StoredProcedure.sql has changed

提前感谢您的帮助吉姆

Answer 1

在使用Mercurial的views on binary files时，它实际上并不跟踪文件类型，这意味着用户无法将文件标记为二进制文件或非二进制文件。

正如tonfa和Rudi所提到的，Mercurial通过查看文件中的任何位置是否存在NUL字节来确定文件是否为二进制文件。在UTF- [16 | 32]文件的情况下，几乎可以保证NUL字节。

要“修复”此问题，您必须确保使用UTF-8而不是UTF-16编码文件。理想情况下，数据库在执行导出时将具有Unicode编码设置。如果不是这种情况，另一个选择就是编写一个预先挂起来执行它（请参阅How to convert a file to UTF-8 in Python开始），但是你必须非常小心你要转换的文件。

Answer 2

我知道这有点晚了，但我正在评估Kiln并遇到了这个问题。在与Fogbugz的那些人讨论后，除了SSMS的“文件/另存为”之外，每个* .sql文件（非常繁琐）之后，我决定看看编写快速脚本来转换* .sql文件。

幸运的是，您可以使用一种Microsoft技术（Powershell）来克服另一种Microsoft技术（SSMS）的问题 - 使用Powershell，更改到包含* .sql文件的目录，然后复制并粘贴以下内容进入Powershell shell（或保存为.ps1脚本并从Powershell运行 - 确保在尝试运行.ps1脚本之前运行命令“Set-ExecutionPolicy RemoteSigned”）：

function Get-FileEncoding
{
  [CmdletBinding()] Param (
  [Parameter(Mandatory = $True, ValueFromPipelineByPropertyName = $True)] [string]$Path
  )

  [byte[]]$byte = get-content -Encoding byte -ReadCount 4 -TotalCount 4 -Path $Path

  if ( $byte[0] -eq 0xef -and $byte[1] -eq 0xbb -and $byte[2] -eq 0xbf )
  { Write-Output 'UTF8' }
  elseif ($byte[0] -eq 0xfe -and $byte[1] -eq 0xff)
  { Write-Output 'Unicode' }
  elseif ($byte[0] -eq 0xff -and $byte[1] -eq 0xfe)
  { Write-Output 'Unicode' }
  elseif ($byte[0] -eq 0 -and $byte[1] -eq 0 -and $byte[2] -eq 0xfe -and $byte[3] -eq 0xff)
  { Write-Output 'UTF32' }
  elseif ($byte[0] -eq 0x2b -and $byte[1] -eq 0x2f -and $byte[2] -eq 0x76)
  { Write-Output 'UTF7'}
  else
  { Write-Output 'ASCII' }
}


$files = get-ChildItem "*.sql"
foreach ( $file in $files )
{
$encoding = Get-FileEncoding $file
If ($encoding -eq 'Unicode')
    {
    (Get-Content "$file" -Encoding Unicode) | Set-Content -Encoding UTF8 "$file"
    }
}

函数Get-FileEncoding是http://poshcode.org/3227的礼貌，虽然我不得不稍微修改它以满足SSMS似乎已将这些文件保存为的UC2小端文件。我建议首先备份您的文件，因为它会覆盖原始文件 - 当然，您可以修改脚本以便保存文件的UTF-8版本，例如更改最后一行代码：

(Get-Content "$file" -Encoding Unicode) | Set-Content -Encoding UTF8 "$file.new"

脚本也应该易于修改以遍历子目录。

现在，您需要记住在提交和推送更改之前，如果有任何新的* .sql文件，请运行此文件。任何已经转换并随后在SSMS中打开的文件在保存时将保持为UTF-8。

扩展名为.sql的文件在Mercurial中标识为二进制文件

2 个答案: