比如说我的输入文件 - file1.tsv有以下2列
type grocery
fruits orange
fruits apple
fruits kiwi
greens collard
greens spinach
期望的结果是
type grocery
fruits orange, apple, kiwi
greens collard, spinach
我可以在第1列中读取重复项作为字典但我无法用逗号附加未重复的第2列值。在python中有这个快速解决方案吗?
答案 0 :(得分:2)
如果文件按列1分组:
awk 'p==$1{s=s ", " $2; next} {if(p)print s; p=$1; s=$0} END{print s}' file
答案 1 :(得分:1)
您只需将值存储为数组:
types = ['type','fruits','greens']
values = [['grocery'],['orange','apple','kiwi'],['collard', 'spinach']]
my_dict = dict(zip(types, values))
>>> print my_dict
{'type': ['grocery'], 'fruits': ['orange','apple','kiwi'], 'greens': ['collard', 'spinach']}
这样,如果你想添加任何内容,你只需要这样做:
my_dict['type'].append('dairy')
my_dict['fruits'].append('banana')
如果你想创建一个新类型,只需使用一个新名称,python将自动创建一个新的键值对,如下所示:
my_dict['meats'] = ['beef', 'chicken', 'fish']
>>> len(my_dict['meats']) # number of items in 'meats'
3
答案 2 :(得分:1)
您的输入
$ cat f
type grocery
fruits orange
fruits apple
fruits kiwi
greens collard
greens spinach
Awk代码:
awk 'NR==1{
print
next
}
{
A[$1]=A[$1]?A[$1]","$2:$2
}
END{
for(i in A)
print i,A[i]
}' f
所得
type grocery
greens collard,spinach
fruits orange,apple,kiwi
<强> - 编辑 - 强>
如果订单很重要,试试这个,输入两次相同的文件。
awk 'FNR==NR{
A[$1]=A[$1]?A[$1]","$2:$2
next
}
($1 in A){
print $1,A[$1];
delete A[$1]
}' f f
所得
type grocery
fruits orange,apple,kiwi
greens collard,spinach
答案 3 :(得分:0)
使用awk,
awk '{ arr[$1] = arr[$1] ? arr[$1] ", " $2 : $2 } \
END { for (var in arr) print var, " ", arr[var] }' file1.tsv
答案 4 :(得分:0)
另一个Python解决方案
from collections import defaultdict
from csv import DictReader
d = defaultdict(list)
with open('file1.tsv') as f:
x = DictReader(f, delimiter='\t')
for l in x:
d[l['type']].append(l['grocery'])
print " ".join(l.iterkeys())
for k in d:
print k, ",".join(d[k])