如何对f#中的行进行排序?

时间:2013-02-28 12:53:59

标签: f# sorting

我想对以下形式的一些制表符分隔数据进行排序。

Marketing, Advertising, PR  Graduate, Trainees  Oil, Gas, Alternative Energy    
Marketing, Advertising, PR  Graduate, Trainees  Public Sector & Services    
Marketing, Advertising, PR  Graduate, Trainees  Recruitment Sales   
Marketing, Advertising, PR  Graduate, Trainees  Secretarial, PAs, Administration    
Marketing, Advertising, PR  Graduate, Trainees  Senior Appointments 
Marketing, Advertising, PR  Graduate, Trainees  Telecommunications  
Marketing, Advertising, PR  Graduate, Trainees  Transport, Logistics    
Other   Graduate, Trainees  Banking, Insurance, Finance 
Other   Graduate, Trainees  Customer Services   
Other   Graduate, Trainees  Education   
Other   Graduate, Trainees  Health, Nursing 
Other   Graduate, Trainees  Legal   
Other   Graduate, Trainees  Management Consultancy

单一短语单词和多单词短语混合在一起。这些短语的单词之间有逗号。短语以制表符分隔。

我需要将它与另一组数据进行比较,其中文本单元格按字母顺序排序。

显然,这使直接比较变得困难(不可能)。

根据ovastus的建议,我有以下代码

open System;;
open System.IO;;
#load @"BigDataModule.fs";;
open BigDataModule;;

let sample = "TruncatedData.txt";;

let outputFile = "SortedOutput.csv";;


let sortWithinRow (row:string) =
    let columns = row.Split([|'\t'|])
    let sortedColumns = 
        Seq.append
            (columns |> Seq.take (columns.Length) |> Seq.sort)
            [ columns.[columns.Length - 1] ]            
    sortedColumns |> String.concat ",";;

sample |> readLines |>  Seq.map sortWithinRow |> saveTo (outputFile);;

其中readLines和saveTo是我自己的大数据模块中的函数,用于读取文件和保存输出。

当我从这个脚本获得输出时,遗憾的是排序没有产生所需的结果,行仍然没有按字母顺序排序。

如果有人可以帮助我进一步完善我的剧本,我将非常感激。

我为浪费时间而道歉,原先通过过度简化输入的格式来确定问题。

编辑1:澄清我已将数据保存为csv文件,并将在F#中执行此操作。

编辑2:我已经摆脱了数据集的所有无关部分,我只需要在这些行中进行排序。我还提供了一些我尝试过的代码的详细信息。

编辑3:

这是我输入的原始数据框架,这是一个过于简单化的

Alpha   Bravo   Tango   Delta   15.00
Bravo   Delta   Tango       20.30
Delta   Alpha   Tango   6.17   
Charlie Tango   Foxtrot Alpha   19.13

3 个答案:

答案 0 :(得分:1)

我不确定我是否理解你想要的东西,但如果你想生成这个输出:

 Alpha Bravo Delta Tango 15.00
 Bravo Delta Tango 20.30
 Alpha Delta Tango 6.17
 Alpha Charlie Foxtrot Tango 19.13

你可以这样做:

open System

let sample = """Alpha  Bravo Tango Delta    15.00
Bravo  Delta Tango          20.30
Delta  Alpha Tango          6.17
Charlie Tango Foxtrot Alpha 19.13""".Split [|'\n'|]

let sortWithinRow (row:string) =
    let columns = row.Split([|' '|], StringSplitOptions.RemoveEmptyEntries)
    let sortedColumns = 
        Seq.append
            (columns |> Seq.take (columns.Length - 1) |> Seq.sort)
            [ columns.[columns.Length - 1] ]            
    sortedColumns |> String.concat " "

sample |> Seq.map sortWithinRow |> String.concat "\n"

答案 1 :(得分:1)

以下情况如何?

sample |> 
  Seq.map (fun x -> x.Split('\t')) |> 
  Seq.map (Seq.map (fun x -> x.Trim())) |> 
  Seq.map (Seq.filter (fun x -> not (String.IsNullOrEmpty(x)))) |>
  Seq.map Seq.sort |> 
  Seq.map (String.concat '\t') |> 
  String.concat '\n';;

我无法以粘贴示例的方式输入\ t,因此对于可执行示例,我必须将字段分隔符切换为空格

open System

let sample2 = """Alpha  Bravo Tango Delta    15.00
Bravo  Delta Tango          20.30
Delta  Alpha Tango          6.17
Charlie Tango Foxtrot Alpha 19.13""".Split [|'\n'|]

sample2 |> 
  Seq.map (fun x -> x.Split([|"  "|], StringSplitOptions.None)) |> 
  Seq.map (Seq.map (fun x -> x.Trim())) |> 
  Seq.map (Seq.filter (fun x -> not (String.IsNullOrEmpty(x)))) |>
  Seq.map Seq.sort |> 
  Seq.map (String.concat '\t') |> 
  String.concat '\n';;

答案 2 :(得分:0)

尝试使用F# Data

[<Literal>]
let sample = """Text1,Text2,Text3,Text4,ValueField
Alpha,Bravo,Tango,Delta,15.00
Bravo,Delta,Tango,,20.30
Delta,Alpha,Tango,,6.17
Charlie,Tango,Foxtrot,Alpha,19.13"""

open FSharp.Data

let csv = CsvProvider<sample, Separator = ",">.Load("input.csv")

let sortedData = 
    csv.Data 
    |> Seq.sortBy (fun row -> row.Text1)
    |> Seq.map (fun row -> row.Columns |> String.concat ",")

System.IO.File.WriteAllLines("output.csv", sortedData)

如果你想按多个字段排序,你可以在排序函数中对它们进行排序:

|> Seq.sortBy (fun row -> row.Text1, row.Text3)