我想对以下形式的一些制表符分隔数据进行排序。
Marketing, Advertising, PR Graduate, Trainees Oil, Gas, Alternative Energy
Marketing, Advertising, PR Graduate, Trainees Public Sector & Services
Marketing, Advertising, PR Graduate, Trainees Recruitment Sales
Marketing, Advertising, PR Graduate, Trainees Secretarial, PAs, Administration
Marketing, Advertising, PR Graduate, Trainees Senior Appointments
Marketing, Advertising, PR Graduate, Trainees Telecommunications
Marketing, Advertising, PR Graduate, Trainees Transport, Logistics
Other Graduate, Trainees Banking, Insurance, Finance
Other Graduate, Trainees Customer Services
Other Graduate, Trainees Education
Other Graduate, Trainees Health, Nursing
Other Graduate, Trainees Legal
Other Graduate, Trainees Management Consultancy
单一短语单词和多单词短语混合在一起。这些短语的单词之间有逗号。短语以制表符分隔。
我需要将它与另一组数据进行比较,其中文本单元格按字母顺序排序。
显然,这使直接比较变得困难(不可能)。
根据ovastus的建议,我有以下代码
open System;;
open System.IO;;
#load @"BigDataModule.fs";;
open BigDataModule;;
let sample = "TruncatedData.txt";;
let outputFile = "SortedOutput.csv";;
let sortWithinRow (row:string) =
let columns = row.Split([|'\t'|])
let sortedColumns =
Seq.append
(columns |> Seq.take (columns.Length) |> Seq.sort)
[ columns.[columns.Length - 1] ]
sortedColumns |> String.concat ",";;
sample |> readLines |> Seq.map sortWithinRow |> saveTo (outputFile);;
其中readLines和saveTo是我自己的大数据模块中的函数,用于读取文件和保存输出。
当我从这个脚本获得输出时,遗憾的是排序没有产生所需的结果,行仍然没有按字母顺序排序。
如果有人可以帮助我进一步完善我的剧本,我将非常感激。
我为浪费时间而道歉,原先通过过度简化输入的格式来确定问题。
编辑1:澄清我已将数据保存为csv文件,并将在F#中执行此操作。
编辑2:我已经摆脱了数据集的所有无关部分,我只需要在这些行中进行排序。我还提供了一些我尝试过的代码的详细信息。
编辑3:
这是我输入的原始数据框架,这是一个过于简单化的
Alpha Bravo Tango Delta 15.00
Bravo Delta Tango 20.30
Delta Alpha Tango 6.17
Charlie Tango Foxtrot Alpha 19.13
答案 0 :(得分:1)
我不确定我是否理解你想要的东西,但如果你想生成这个输出:
Alpha Bravo Delta Tango 15.00
Bravo Delta Tango 20.30
Alpha Delta Tango 6.17
Alpha Charlie Foxtrot Tango 19.13
你可以这样做:
open System
let sample = """Alpha Bravo Tango Delta 15.00
Bravo Delta Tango 20.30
Delta Alpha Tango 6.17
Charlie Tango Foxtrot Alpha 19.13""".Split [|'\n'|]
let sortWithinRow (row:string) =
let columns = row.Split([|' '|], StringSplitOptions.RemoveEmptyEntries)
let sortedColumns =
Seq.append
(columns |> Seq.take (columns.Length - 1) |> Seq.sort)
[ columns.[columns.Length - 1] ]
sortedColumns |> String.concat " "
sample |> Seq.map sortWithinRow |> String.concat "\n"
答案 1 :(得分:1)
以下情况如何?
sample |>
Seq.map (fun x -> x.Split('\t')) |>
Seq.map (Seq.map (fun x -> x.Trim())) |>
Seq.map (Seq.filter (fun x -> not (String.IsNullOrEmpty(x)))) |>
Seq.map Seq.sort |>
Seq.map (String.concat '\t') |>
String.concat '\n';;
我无法以粘贴示例的方式输入\ t,因此对于可执行示例,我必须将字段分隔符切换为空格
open System
let sample2 = """Alpha Bravo Tango Delta 15.00
Bravo Delta Tango 20.30
Delta Alpha Tango 6.17
Charlie Tango Foxtrot Alpha 19.13""".Split [|'\n'|]
sample2 |>
Seq.map (fun x -> x.Split([|" "|], StringSplitOptions.None)) |>
Seq.map (Seq.map (fun x -> x.Trim())) |>
Seq.map (Seq.filter (fun x -> not (String.IsNullOrEmpty(x)))) |>
Seq.map Seq.sort |>
Seq.map (String.concat '\t') |>
String.concat '\n';;
答案 2 :(得分:0)
尝试使用F# Data
[<Literal>]
let sample = """Text1,Text2,Text3,Text4,ValueField
Alpha,Bravo,Tango,Delta,15.00
Bravo,Delta,Tango,,20.30
Delta,Alpha,Tango,,6.17
Charlie,Tango,Foxtrot,Alpha,19.13"""
open FSharp.Data
let csv = CsvProvider<sample, Separator = ",">.Load("input.csv")
let sortedData =
csv.Data
|> Seq.sortBy (fun row -> row.Text1)
|> Seq.map (fun row -> row.Columns |> String.concat ",")
System.IO.File.WriteAllLines("output.csv", sortedData)
如果你想按多个字段排序,你可以在排序函数中对它们进行排序:
|> Seq.sortBy (fun row -> row.Text1, row.Text3)