我试图按特定字段对此文件进行排序,我希望在awk
中完成所有操作:
"firstName": "gdrgo", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "lastName": "222",dfg
"xxxxx": "John", "firstName": "beto", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "lastName": "111","xxxxx": "John",
"xxxxx": "John", "firstName": "beto", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "lastName": "111","xxxxx": "John",
"xxxxx": "John", "xxxxx": "John", "firstName": "beto2", "xxxxx": "John","lastName": "555", "xxxxx": "John","xxxxx": "John",
"xxxxx": "John", "xxxxx": "John", "firstName": "beto2", "xxxxx": "John","lastName": "444", "xxxxx": "John","xxxxx": "John",
"firstName": "gdrgo", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "lastName": "222",dfg
"xxxxx": "John", "xxxxx": "John", "firstName": "beto2", "xxxxx": "John","lastName": "444", "xxxxx": "John","xxxxx": "John",
我使用这个命令:
awk -F'.*"firstName": "|",.*"lastName": "|",' '{b[$3]=$0} END{for(i in b){print i}}' sumacomando
输出:
111
222
444
555
但我期待:
111
111
222
222
444
444
555
也就是说,虽然实际输出看似按照需要排序,但却意外地丢失了重复值。
答案 0 :(得分:2)
您选择的字段分隔符是非常规的,也许更好地使用它
awk -F'[:,]' '{for(i=1;i<=NF;i++)
if($i~"\"lastName\"")
{gsub(/"/,"",$(i+1));
print $(i+1)}}' file | sort
如果您的awk
具有asort
功能,则可以执行此操作
awk -F'[:,]' '{for(i=1;i<=NF;i++)
if($i~"\"lastName\"")
{gsub(/"/,"",$(i+1));
a[++c]=$(i+1)}}
END {asort(a);
for(k=1;k in a;k++) print a[k]}' file
答案 1 :(得分:2)
awk
数组中键/索引的排序,总是关联数组(字典),是一个实现细节 - 没有特定的顺序保证;在你的情况下,输出恰好是排序。
键是唯一,因此如果多于1个输入行中的$3
具有相同的值,则b[$3]=...
分配会相互覆盖 - 最后一个获胜。
你因此:
必须使用顺序索引的数组来存储第3个字段值($3
)
必须按照以后的值对结果数组进行排序。
根据POSIX Awk规范,Awk没有内置的排序函数,但 GNU awk
可以使用asort()
函数启用以下解决方案:
awk -F'.*"firstName": "|",.*"lastName": "|",' '
{ b[++n]=$3 } END{ asort(b); for(i=1;i<=n;++i) print b[i] }
' sumacomando
请注意,这不包括存储关联的整行($0
)。
如果您还希望在(GNU)Awk中执行排序时存储关联的完整行,则会变得更复杂:
awk -F'.*"firstName": "|",.*"lastName": "|",' '
# Use a compound key to store the value of $3 plus a sequential index
# to disambiguate, and store the input row ($0) as the value.
{ vals[$3,++n]=$0 }
END{
# Sort by compound key using the helper function defined below.
asorti(vals, names, "cmp_func");
# Output the first half of the compound key, i.e., the value of $3,
# followed by the associated input row.
for(i=1;i<=n;++i) print gensub(SUBSEP ".*$", "", 1, names[i]), vals[names[i]]
}
# Helper sort function that splits the compound key into its components
# - $3 value and sequential index - and compares the $3 values alphabetically
# and the indices numerically.
function cmp_func(i1, v1, i2, v2) {
split(i1, tokens1, SUBSEP)
split(i2, tokens2, SUBSEP)
if (tokens1[1] < tokens2[1]) return -1
if (tokens1[1] > tokens2[1]) return 1
i1 = int(tokens1[2])
i2 = int(tokens2[2])
if (i1 < i2) return -1
if (i1 > i2) return 1
return 0
}
' sumacomando
作为替代解决方案的管道sort
大大简化了问题:
awk -F'.*"firstName": "|",.*"lastName": "|",' '{ print $3, $0 }' sumacomando | sort -k1,1
但请注意,上面的纯Awk解决方案会保留重复的$3
值之间的输入顺序,sort
辅助解决方案不会。
相反,纯Awk解决方案需要立即将所有输入存储在内存中,而sort
实用程序已经过优化,可以处理大型输入集并按需使用临时文件。
答案 2 :(得分:1)
@victorhernandezzero:@try:我尝试了不同的方法,我希望它也可以帮助你/所有人。只有一个awk(没有其他命令)。
class Employee{
private int id;
public Employee(int i) {
// TODO Auto-generated constructor stub
this.id = i;
}
}
public class HashMapExample {
public static void main(String[] args) {
HashMap<Employee,Integer> map =new HashMap<Employee,Integer>();
map.put(new Employee(101),10);
map.put(new Employee(101),20);
System.out.println(map);
Employee emp;
for(Map.Entry<Employee,Integer> entry : map.entrySet()){
System.out.println(entry.getKey()+" "+entry.getValue());
emp = entry.getKey();
System.out.println(emp.equals(emp));
emp.hashCode();
}
}
}
EDIT1:以上解决方案不会提供您需要的重复项,特别感谢mklement0让我知道,以下内容也可以帮助您。
awk '/lastName/{getline;while(!$0){getline};A[$0]} END{num=asorti(A, B);for(i=1;i<=num;i++){print B[i]}}' RS='[: ",]' Input_file