Powershell:在CSV文件中排序/删除重复项

时间:2012-02-14 20:24:59

标签: sorting powershell unique

首先,我是Powershell的新手,我要感谢本网站的所有参与者通过提供不同四分之一的答案来帮助我!由于这个网站,我在短时间内取得了很多成就!

这是问题,我会尽力解释。我有一个CSV文件来创建学生帐户。我们的学生管理系统会在学生每次注册,修改或退出课程时生成记录。如果该学生“试用”了一些不同的程序,他们将在CSV文件中有多个记录。所以我的目标是按用户ID(用户ID永不改变)和CurrentStatusDate(创建记录时)对CSV文件进行排序。使用此命令:

Import-CSV "C:\students.csv" | sort UserID,CurrentStatusDate

CSV记录样本:

"UserID","AccountStatus","PersonID","PIN","FirstName","LastName","IDEXPIRY","Term","Role","Course","SectionName","locationDescription","Location","CurrentStatusDate"
"aboggs","Add","xxxxxxx","xxxxxxx","Ashley","Baggs","5/11/2013","xxxxxx","Student","Accounting Technology","xxxxxx","xxxxxx","xxxxxx","9/12/2011"
"aboutilier","Add","xxxxxxx","xxxxxxx","Amelia","Boutilier","5/3/2012","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","11/15/2011"
"abowtle","Delete","xxxxxxx","xxxxxxx","Aleisha","Bowtle","7/31/2013","xxxxxx","Student","Business Administration","xxxxxx","xxxxxx","xxxxxx","2/1/2011"
"abowtle","Add","xxxxxxx","xxxxxxx","Aleisha","Bowtle","7/31/2012","xxxxxx","Student","General Studies","xxxxxx","xxxxxx","xxxxxx","9/9/2011"
"abradley","Delete","xxxxxxx","xxxxxxx","Anna","Bradley","10/25/2011","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","11/17/2011"
"abridges","Delete","xxxxxxx","xxxxxxx","Ashley","Bridges","10/5/2011","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","11/15/2011"
"abrown10165","Add","xxxxxxx","xxxxxxx","Adam","Brown","10/28/2011","xxxxxx","Student","Advanced Firefighting STCW VI/3","xxxxxx","xxxxxx","xxxxxx","10/24/2011"
"abrown10165","Add","xxxxxxx","xxxxxxx","Adam","Brown","12/16/2011","xxxxxx","Student","Simulated Electronic Navigation Level 1, Part B","xxxxxx","xxxxxx","xxxxxx","11/10/2011"
"abrown8081","Add","xxxxxxx","xxxxxxx","Alex","Brown","5/25/2013","xxxxxx","Student","Culinary Arts","xxxxxx","xxxxxx","xxxxxx","9/6/2011"
"abrown8950","Delete","xxxxxxx","xxxxxxx","Ashley","Brown","9/13/2012","xxxxxx","Student","Medical Support Services","xxxxxx","xxxxxx","xxxxxx","9/14/2011"
"acameron2637","Delete","xxxxxxx","xxxxxxx","Anne","Cameron","10/14/2011","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","10/14/2011"
"acameron4368","Add","xxxxxxx","xxxxxxx","Amanda","Cameron","4/20/2013","xxxxxx","Student","Applied Degree in Culinary Operations","xxxxxx","xxxxxx","xxxxxx","10/12/2011"
"acampbell10266","Add","xxxxxxx","xxxxxxx","Amanda","Campbell","5/4/2012","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","11/7/2011"
"acampbell6499","Delete","xxxxxxx","xxxxxxx","Aaron","Campbell","10/31/2012","xxxxxx","Student","Retail Business Management","xxxxxx","xxxxxx","xxxxxx","11/1/2011"
"acampbell6499","Add","xxxxxxx","xxxxxxx","Aaron","Campbell","12/13/2011","xxxxxx","Student","Complete the Accounting Cycle - Part II","xxxxxx","xxxxxx","xxxxxx","9/26/2011"

这应该将所有userID分组为相同的记录,然后按创建的日期对它们进行排序。然后我想删除重复项并保留最后创建的记录。我熟悉-Unique,但它不适用于上面的命令,因为它只会删除具有重复userID和CurrentStatusDates的记录。

如果一直是“Google-ing”并且猛烈抨击我2天......开始认为没有“简单”的答案,但我的编程功能很弱......只是在寻找“轻推”正确的方向。

谢谢!

克里斯

3 个答案:

答案 0 :(得分:2)

正如Andy所说,鉴于我们没有CSV格式的样本,这有点难。但是我认为你正在寻找类似下面的东西:

Import-CSV "C:\students.csv" | Group-Object userid | foreach-object { $_.group | sort-object currentstatusdate | select -last 1}

正如您所描述的那样 - 我们按ID分组,然后按CurrentStatusDate排序,然后选择最近的记录。我不确定CurrentStatusDate是如何格式化的,所以我不知道一个直接的排序对象是否足够好。

答案 1 :(得分:0)

怎么样:

  • 在一个分隔符上加入各个领域(http://www.johndcook.com/PowerShellCookbook.html#a19)
  • 使用独特的
  • 拆分

答案 2 :(得分:0)

未经测试:

 $new_csv = @()
 Import-CSV "C:\students.csv" | sort UserID |
  foreach {
    if ($temp -eq $null){$temp = $_}
    if ($_.UserID -ne $temp.UserID){
       $new_csv += $temp
       $temp = $_
       }
elseif ([datetime]$_.CurrentStatusDate -gt [datetime]$temp.CurrentStatusDate){
   $temp = $_
   }
 } 
 $new_csv += $temp
 export-csv $new_csv c:\somedir\new_csv.csv -notype

当第一条记录通过时,$ temp将设置为该记录。当每个新记录通过时,如果它是相同的学生ID,则会在$ temp中再次检查时间戳记录。如果它更新,那就会进入$ temp。当它看到userID发生变化时,它会将$ temp(现在应该是最后一个用户的最新记录)写入$ new_csv。然后它将$ temp设置为当前记录,并重新开始下一个userID。由于它不会看到最后一个帐户的用户标识更改,因此您必须在循环完成后选择一个,然后导出csv。

不确定该时间戳的格式究竟是什么,但我假设它会正确解析为[datetime]。但是,如果它来自.csv它将是一个字符串,并且似乎不太可能在字符串排序中按日期时间顺序排序,所以我甚至没有打扰。