Question

我试图按特定字段对此文件进行排序，我希望在awk中完成所有操作：

"firstName": "gdrgo",   "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John",   "lastName": "222",dfg
"xxxxx": "John",    "firstName": "beto",   "xxxxx": "John", "xxxxx": "John", "xxxxx": "John",   "lastName": "111","xxxxx": "John",
"xxxxx": "John",    "firstName": "beto",   "xxxxx": "John", "xxxxx": "John", "xxxxx": "John",   "lastName": "111","xxxxx": "John",
"xxxxx": "John",   "xxxxx": "John",    "firstName": "beto2", "xxxxx": "John","lastName": "555", "xxxxx": "John","xxxxx": "John",
"xxxxx": "John",   "xxxxx": "John",    "firstName": "beto2", "xxxxx": "John","lastName": "444", "xxxxx": "John","xxxxx": "John",
"firstName": "gdrgo",   "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John",   "lastName": "222",dfg
"xxxxx": "John",   "xxxxx": "John",    "firstName": "beto2", "xxxxx": "John","lastName": "444", "xxxxx": "John","xxxxx": "John",

我使用这个命令：

awk -F'.*"firstName": "|",.*"lastName": "|",' '{b[$3]=$0} END{for(i in b){print i}}' sumacomando

输出：

但我期待：

也就是说，虽然实际输出看似按照需要排序，但却意外地丢失了重复值。

Answer 1

您选择的字段分隔符是非常规的，也许更好地使用它

awk -F'[:,]' '{for(i=1;i<=NF;i++) 
                  if($i~"\"lastName\"") 
                      {gsub(/"/,"",$(i+1)); 
                       print $(i+1)}}' file | sort

如果您的awk具有asort功能，则可以执行此操作

awk -F'[:,]' '{for(i=1;i<=NF;i++) 
                 if($i~"\"lastName\"") 
                    {gsub(/"/,"",$(i+1)); 
                     a[++c]=$(i+1)}} 
          END {asort(a); 
               for(k=1;k in a;k++) print a[k]}' file

Answer 2

awk数组中键/索引的排序，总是关联数组（字典），是一个实现细节 - 没有特定的顺序保证;在你的情况下，输出恰好是排序。
键是唯一，因此如果多于1个输入行中的$3具有相同的值，则b[$3]=...分配会相互覆盖 - 最后一个获胜。

你因此：

必须使用顺序索引的数组来存储第3个字段值（$3）
必须按照以后的值对结果数组进行排序。

根据POSIX Awk规范，Awk没有内置的排序函数，但 GNU awk可以使用asort()函数启用以下解决方案：

awk -F'.*"firstName": "|",.*"lastName": "|",' ' { b[++n]=$3 } END{ asort(b); for(i=1;i<=n;++i) print b[i] } ' sumacomando

请注意，这不包括存储关联的整行（$0）。

如果您还希望在（GNU）Awk中执行排序时存储关联的完整行，则会变得更复杂：

awk -F'.*"firstName": "|",.*"lastName": "|",' ' # Use a compound key to store the value of $3 plus a sequential index # to disambiguate, and store the input row ($0) as the value. { vals[$3,++n]=$0 } END{ # Sort by compound key using the helper function defined below. asorti(vals, names, "cmp_func"); # Output the first half of the compound key, i.e., the value of $3, # followed by the associated input row. for(i=1;i<=n;++i) print gensub(SUBSEP ".*$", "", 1, names[i]), vals[names[i]] } # Helper sort function that splits the compound key into its components # - $3 value and sequential index - and compares the $3 values alphabetically # and the indices numerically. function cmp_func(i1, v1, i2, v2) { split(i1, tokens1, SUBSEP) split(i2, tokens2, SUBSEP) if (tokens1[1] < tokens2[1]) return -1 if (tokens1[1] > tokens2[1]) return 1 i1 = int(tokens1[2]) i2 = int(tokens2[2]) if (i1 < i2) return -1 if (i1 > i2) return 1 return 0 } ' sumacomando

作为替代解决方案的管道sort大大简化了问题：

awk -F'.*"firstName": "|",.*"lastName": "|",' '{ print $3, $0 }' sumacomando | sort -k1,1

但请注意，上面的纯Awk解决方案会保留重复的$3值之间的输入顺序，sort辅助解决方案不会。

相反，纯Awk解决方案需要立即将所有输入存储在内存中，而sort实用程序已经过优化，可以处理大型输入集并按需使用临时文件。

Answer 3

@victorhernandezzero：@try：我尝试了不同的方法，我希望它也可以帮助你/所有人。只有一个awk（没有其他命令）。

class Employee{
    private int id;
    public Employee(int i) {
        // TODO Auto-generated constructor stub
        this.id = i;
    }
}

public class HashMapExample {

    public static void main(String[] args) {
        HashMap<Employee,Integer> map =new HashMap<Employee,Integer>();
        map.put(new Employee(101),10);
        map.put(new Employee(101),20);

        System.out.println(map);
        Employee emp;
        for(Map.Entry<Employee,Integer> entry : map.entrySet()){
            System.out.println(entry.getKey()+"  "+entry.getValue());
             emp = entry.getKey();
            System.out.println(emp.equals(emp));
            emp.hashCode();

        }
    }

}

EDIT1：以上解决方案不会提供您需要的重复项，特别感谢mklement0让我知道，以下内容也可以帮助您。

awk '/lastName/{getline;while(!$0){getline};A[$0]} END{num=asorti(A, B);for(i=1;i<=num;i++){print B[i]}}' RS='[: ",]'   Input_file

我的Awk命令排序，但意外地省略重复

3 个答案: