使用PowerShell(我对编码相对较新),我试图获取一个包含26列的大型CSV文件,并尝试在某些字段具有重复数据时操纵数据...但保留所有如果该字段不重复,则为数据。
示例数据:
Name,DOB,Address,PhoneNo,FaveSport,FaveTeam,FavePlayer,
Nick,1/1/01,123 4th,123-456-7890,,,
Nick,1/1/01,,,Hockey,Red Wings,Lidstrom
Calvin,2/2/02,456 7th,555-867-5309,Football,Lions,Megatron
Mickey,3/3/03,999 Yankee Way,111-222-3333,,,
Mickey,3/3/03,,,Baseball,Yankees,Mantle
在上面的场景中,我想保留Nick的前四列和前三列的第二列,几乎是重复的一行。它总是以相同的方式,顶行有适当的前4列和第二行(如果有第二行 - 有时只有1像凯文,在这种情况下我们保持整行)有数据我们希望在最后3列。
因此,我们在完成后想要的数据是
Name,DOB,Address,PhoneNo,FaveSport,FaveTeam,FavePlayer,
Nick,1/1/01,123 4th,123-456-7890,Hockey,Red Wings,Lidstrom
Calvin,2/2/02,456 7th,555-867-5309,Football,Lions,Megatron
Mickey,3/3/03,999 Yankee Way,111-222-3333,Baseball,Yankees,Mantle
我完全不知道如何将一行的前x列与另一行的x列进行比较以检查"重复"然后将第一行的前x个字段和第二行的最后x个字段写入新文档...
非常感谢任何帮助。试图成为我妻子的英雄,他现在必须通过在5k +行Excel文档上反复手动复制/粘贴来做到这一点。
答案 0 :(得分:2)
您可以使用Hashtable
存储第一行,然后如果出现另一个具有相同名称的行,则只复制具有实际值的列:
$Data = @'
Name,DOB,Address,PhoneNo,FaveSport,FaveTeam,FavePlayer,
Nick,1/1/01,123 4th,123-456-7890,,,
Nick,1/1/01,,,Hockey,Red Wings,Lidstrom
Calvin,2/2/02,456 7th,555-867-5309,Football,Lions,Megatron
Mickey,3/3/03,999 Yankee Way,111-222-3333,,,
Mickey,3/3/03,,,Baseball,Yankees,Mantle
'@|ConvertFrom-Csv
# Set up a hashtable to keep track of distinct player names
$Players = @{}
foreach($Row in $Data) {
if(-not $Players.ContainsKey($Row.Name))
{
# First row with that player name
$Players[$Row.Name] = $Row
}
else
{
# We've already read the first row for this guy
foreach($Property in $Row.psobject.Properties)
{
# Check each property for whether it has a value
if($Property.Value)
{
# Overwrite previous property value
$Players[$Row.Name]."$($Property.Name)" = $Property.Value
}
}
}
}
# Print final results
$Players.Values |Format-Table
答案 1 :(得分:0)
既然你只想要最后一列,并扩展@Mathias的伟大工作,你可以这样做:
$Data = @'
Name,DOB,Address,PhoneNo,FaveSport,FaveTeam,FavePlayer,
Nick,1/1/01,123 4th,123-456-7890,,,
Nick,1/1/01,,,Hockey,Red Wings,Lidstrom
Calvin,2/2/02,456 7th,555-867-5309,Football,Lions,Megatron
Mickey,3/3/03,999 Yankee Way,111-222-3333,,,
Mickey,3/3/03,,,Baseball,Yankees,Mantle
'@|ConvertFrom-Csv
# Set up a hashtable to keep track of distinct player names
$Players = @{}
# Make a named list of the columns you're wanting to keep from the second rows
$columns = @("FaveSport","FaveTeam","FavePlayer")
foreach($Row in $Data) {
if(-not $Players.ContainsKey($Row.Name))
{
# First row with that player name
$Players[$Row.Name] = $Row
}
else
{
# Check just the named columns that you want to keep the good values for
foreach($item in $columns)
{
# Check each property for whether it has a value
if (-not $Players[$Row.Name]."$($item)".Value){
$Players[$Row.Name]."$($item)" = $Row.FavePlayer
}
}
}
}
# Print final results
$Players.Values |Format-Table
基本上你只是检查并拉入你想要的列。