我的代码的开头如下:
$file1 = "G:\test_powershell_subtitle\The Big Bang Theory - 08x06 - french.srt"
$file2 = "G:\test_powershell_subtitle\The Big Bang Theory - 08x06 - english.srt"
$text1 =get-content($file1) -Raw
$text2 =get-content($file2) -Raw
$regex = [regex]'(?m)(?<sequence>\d+)\r\n(?<timecode>\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3})\r\n(?<text>[\s\S]*?\r\n\r\n)'
$matches = $regex.Matches($text1)
$matches |% {
if ($_ -match $regex){
new-object psobject -property @{
sequence = $matches['sequence']
timecode = $matches['timecode']
text = $matches['text']
}
}
}
输出:
timecode sequence text
---- -------- ----
00:00:02,880 --> 00:00:04,146 1 I like your suit....
00:00:04,148 --> 00:00:06,699 2 Oh, thanks. Got a ...
00:00:06,701 --> 00:00:08,651 3 How does it feel knowing...
00:00:08,653 --> 00:00:10,786 4 is to go out...
我的目标是根据时间码将不同语言的字幕合并到一个文件中。
最好的方法是什么? compare-object,hastables还是psobject?
感谢您的帮助。
答案 0 :(得分:0)
你将有更多工作要做,但这应该足以满足手头的问题。 Group-Object
是我想到的方式。
function Convert-SubtitlesToObject{
param(
[parameter(Mandatory=$true)]
[ValidateScript({Test-Path $_})]
[String]
$Path
)
$regex = [regex]'(?m)(?<sequence>\d+)\r\n(?<timecode>\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3})\r\n(?<text>[\s\S]*?\r\n\r\n)'
$text = Get-Content($Path) -Raw
$matches = $regex.Matches($text)
$matches | Where-Object{$_ -match $regex} | ForEach-Object{
[PSCustomObject][ordered]@{
sequence = $matches['sequence']
timecode = $matches['timecode']
text = $matches['text']
}
}
}
$englishSubs = Convert-SubtitlesToObject -Path 'C:\temp\put\The Big Bang Theory - 8x06 - The Expedition Approximation.HDTV.LOL.HI.en.srt'
$frenchSubs = Convert-SubtitlesToObject -Path 'C:\temp\put\The Big Bang Theory - 8x06 - The Expedition Approximation.HDTV.LOL.fr.srt'
$collection = @()
$collection += $englishSubs
$collection += $frenchSubs
$sequence = 0
$collection | Group-Object timecode | Select-Object Name,@{l="Text";e={$_.Group.Text}} | ForEach-Object{
$sequence++
Write-Output "$sequence`r`n$($_.Name)`r`n$($_.Text)"
}
我将您的代码转换为函数,因为您将为所有文件调用该函数。对英语和法语潜艇运行此命令并将它们放入较大的$collection
。在该集合上调用Group-Object
并按时间码对它们进行分组。获取该数据并将文本扩展为单个字段。毕竟收集输出以最好地模仿字幕文件。你需要注意那些不匹配的时间码,但我会告诉你在那种情况下你会做什么。
以下是一些示例输出,您可以将其输入Out-File
或Add-Content
:
1
00:00:00,000 --> 00:00:01,800
English Subtitles (HI)
[MP4] The Big Bang Theory S08E06 (720p) The Expedition Approximation HDTV [KoTuWa]
2
00:00:02,880 --> 00:00:04,146
I like your suit.
J'aime ton tailleur.
3
00:00:04,148 --> 00:00:06,699
Oh, thanks. Got a couple
new outfits for work.
Merci.
J'en ai acheté pour le boulot.
<强>声明强>
我知道一些PowerShell。我知道关于字幕文件格式的杰克。
答案 1 :(得分:0)
很抱歉我迟到的回复。我试图自己找到解决方案。它不完整,特别是如果时间码不相同。你的更好。
这是我的解决方案。
function Convert-SubtitlesToObject{
param(
[parameter(Mandatory=$true)]
[ValidateScript({Test-Path $_})]
[String]
$Path
)
$regex = [regex]'(?m)(?<sequence>\d+)\r\n(?<timecode>(?<t1>\d{2}:\d{2}:\d{2},\d{3}) --> \d{2}:\d{2}:\d{2},\d{3})\r\n(?<text>[\s\S]*?\r\n\r\n)'
$text = Get-Content($Path) -Raw
$matches = $regex.Matches($text)
$matches | Where-Object{$_ -match $regex} | ForEach-Object{
[PSCustomObject][ordered]@{
sequence = $matches['sequence']
timecode = $matches['timecode']
text = $matches['text']
}
}
}
$englishSubs = Convert-SubtitlesToObject -Path 'G:\test_powershell_subtitle\The Big Bang Theory - 08x06 - english.srt'
$frenchSubs = Convert-SubtitlesToObject -Path 'G:\test_powershell_subtitle\The Big Bang Theory - 08x06 - french.srt'
$temp = Compare-Object $frenchSubs $englishSubs -property sequence,timecode,text
$subtitles=$temp | Group-Object -Property timecode| % {
[PSCustomObject][ordered] @{
seq=$_.group[1].sequence;
time=$_.name;
string=$_.group[0].text+$_.group[1].text}}
#Construct an out-array to use for data export
$OutArray = @()
$Outarray +=$subtitles.psobject.properties | % {$_.value} # each object's fields
# get the index for element that is -eq to SyncRoot
# The SyncRoot is returning the collection
$index = 0..($outarray.psobject.properties.name.length - 1) | ? {$outarray.psobject.properties.name[$_] -eq "SyncRoot"}
for($i = $index; $i -le $OutArray.matches.count; $i++){
Write-Output "$($outarray[$i].seq)`r`n$($outarray[$i].time)`r`n$($outarray[$i].string)`r`n"
}