从命令行排序.csv

时间:2015-06-22 02:20:48

标签: shell csv sorting unix

我正试图通过最后一栏对这一行(2010年全美的人口普查区块人口密度)进行排序。

12001,2,1009,Alachua FL,29.65612,-82.327274,0.0005131,0.013289229,12,902.9869232

censusBlockDensities.csv (从评论中移到此处)

17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336
17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427

1 个答案:

答案 0 :(得分:4)

我假设有一个unix shell(即bash)。

阅读sort命令的手册页: man sort

从手册页:

  

环境指定的区域设置会影响排序顺序。设置LC_ALL = C以获取使用本机字节值的传统排序顺序。

export LC_ALL=C

sort -t , -k 10,10 -n censusBlockDensities.csv

标志说明:

-t ,:将逗号指定为字段分隔符。

-k 10,10:仅指定第10个字段的排序(开始,停止)(第一个字段为1,而不是0)。

  

KEYDEF是开始和停止位置的F [.C] [OPTS] [,F [.C] [OPTS]],其中F是字段编号,C是字段中的字符位置;两者都是原点1,停止位置默认为行的结尾。如果既不是-t也不是   -b有效,字段中的字符从前一个空格的开头计算。 OPTS是一个或多个单字母排序选项[bdfgiMhnRrV],它覆盖该键的全局排序选项。如果没有给出密钥,请使用整行作为密钥。

-n:执行数字排序,而不是默认的字母数字排序(或者,如下面的评论中所述,将{&n;'添加到-k参数中。)

<强> censusBlockDensities.csv

17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336
17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427

<强>输出:

17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336

编辑:有用的评论表明我的回答有误。您还需要-n标志来执行数字排序(默认为字母数字)。我修改了我的答案,包括那个。您可以通过尝试-r标志以相反顺序排序来验证它是否正常工作。我还在-k 10参数中添加了停止字段索引,如another post中所述。

此外,您应检查输入文件以确保每行中的字段数相同:

awk '{print gsub(/,/,"")}' censusBlockDensities.csv

9
9
10 <-- the third record has an additional field
9
9