如何根据第二个文件对一个csv文件排序?

时间:2018-07-06 17:22:47

标签: python csv awk

我有两个文件

F1

bar
foo
egg

F2

"egg","apple","green"
"egg","orange","red"
"egg","apple","green"
"bar","spam","orange"
"bar","orange","blue"
"bacon","red","orange"
"foo","apple","green"
"foo","blue","apple"
"spam","apple","yellow"
"spam","green","egg"

并且我想根据F1对F2进行排序,因此F2中的每一行中都应包含F1中不存在的第一个元素。这样我得到:

"bar","spam","orange"
"bar","orange","blue"
"foo","apple","green"
"foo","blue","apple"
"egg","apple","green"
"egg","orange","red"
"egg","apple","green"
"bacon","red","orange"
"spam","apple","yellow"
"spam","green","egg"

我很喜欢python3解决方案。但是我也愿意在awk中寻求解决方案。

2 个答案:

答案 0 :(得分:1)

能否请您尝试以下操作,如果有帮助,请告诉我。假设您想将F2文件的第一个字段的第一个字段与F1文件的第一个字段(根据显示的示例本身只有一个字段)进行匹配

awk -F'"' '
FNR==NR{
  a[$2]=(a[$2]?a[$2] ORS:"")$0;
  b[$2];
  next
}
($0 in b){
  print a[$0];
  c[$0]
}
END{
  for(i in a){
    if(!(i in c)){ print a[i] }
}}' F2  F1

答案 1 :(得分:1)

list1=['bar','foo','egg']

list2=[["egg","apple","green"],
    ["egg","orange","red"],
    ["egg","apple","green"],
    ["bar","spam","orange"],
    ["bar","orange","blue"],
    ["bacon","red","orange"],
    ["foo","apple","green"],
    ["foo","blue","apple"],
    ["spam","apple","yellow"],
    ["spam","green","egg"]]

list_to_sort=[]
list_not_to_sort=[]
for element in list2:
    if(element[0].split(',')[0] in list1):
        list_to_sort.append(element)
    else:
        not_to_sort.append(element)
list_to_sort.sort()
print(list_to_sort+not_to_sort)

输出:

[['bar', 'orange', 'blue'],
 ['bar', 'spam', 'orange'],
 ['egg', 'apple', 'green'],
 ['egg', 'apple', 'green'],
 ['egg', 'orange', 'red'],
 ['foo', 'apple', 'green'],
 ['foo', 'blue', 'apple'],
 ['bacon', 'red', 'orange'],
 ['spam', 'apple', 'yellow'],
 ['spam', 'green', 'egg']]