我正在处理的CSV文件如下所示:
{http://www.omg.org/XMI}id,begin,end,Emotion
17266,772,781,anticipation
17402,772,781,disgust
17304,1345,1370,disgust
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17472,1578,1602,anger
17525,1611,1617,fear
我正在尝试创建一个列表字典,以'Emotion'
项作为键,并以'begin'
(第二列)作为行中键的值。
所需的输出如下所示:
{'anger': [1578,
2853,
3951,...],
'anticipation': [772, 4154, 4400...],
...}
到目前为止,我已经设法输出了所需的输出,但是每个值都是每个键列表中自己的列表。
我当前的代码:
import pickle
from pprint import pprint
import tkinter
from tkinter import filedialog
import csv
from itertools import groupby
root_tk = tkinter.Tk()
root_tk.wm_withdraw()
def extract_gold_emotions():
"""Returns mapping of GOLD emotions to their indices"""
filename = filedialog.askopenfilename()
l = list(csv.reader(open(filename)))
f = lambda x: x[-1]
gold_emo_offsets = {k:list(sorted(map(int, x[1:2])) for x in v)\
for k,v in groupby(sorted(l[1:], key=f), f)}
pickle.dump(gold_emo_offsets, open("empos.p", "wb"))
return gold_emo_offsets
my_emotions = extract_gold_emotions()
当前输出:
{'anger': [[1578], [2853], [3951], [4084], [4693], [6420], [8050]],
'anticipation': [[772], [4154], [4400], [7392]],....]]}
有什么提示可以更改代码以输出所需的列表字典?
谢谢!
编辑:
字典值应输出为 integers 。
答案 0 :(得分:1)
使用collections.defaultdict
和csv.DictReader
例如:
import csv
import collections
d = collections.defaultdict(list)
with open(filename) as infile:
reader = csv.DictReader(infile)
for row in reader:
d[row["Emotion"]].append(row["begin"])
print(d)
输出:
defaultdict(<type 'list'>, {'anger': ['1578'], 'surprise': ['1534', '1534', '1534', '1534'], 'fear': ['1611'], 'anticipation': ['772'], 'disgust': ['772', '1345']})
答案 1 :(得分:1)
您可以使用collections.defaultdict
来获取结果字典:
from io import StringIO
import csv
from collections import defaultdict
text = '''id,begin,end,Emotion
17266,772,781,anticipation
17402,772,781,disgust
17304,1345,1370,disgust
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17472,1578,1602,anger
17525,1611,1617,fear'''
data = defaultdict(list)
with StringIO(text) as file:
for row in csv.DictReader(file):
data[row['Emotion']].append(row['begin'])
print(data)
答案 2 :(得分:1)
仅使用基本的python,不导入(*):
写文件:
with open("data.csv","w") as w:
w.write("""{http://www.omg.org/XMI}id,begin,end,Emotion
17266,772,781,anticipation
17402,772,781,disgust
17304,1345,1370,disgust
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17472,1578,1602,anger
17525,1611,1617,fear
""")
读取并处理文件
d = {}
with open("data.csv","r") as r:
next(r) # skip header
for line in r:
if line.strip(): # ignore empty lines (f.e. the last one)
l = line.strip().split(",")
begin = l[1] # the begin coloum
emo = l[-1] # the emotion column
k = d.setdefault(emo,[]) # get/create key + empty list if needed
k.append(begin) # append to key as string
# k.append(int(begin)) # append to key but convert to int first
print(d)
输出(附加为字符串):
{'anger': ['1578'],
'surprise': ['1534', '1534', '1534', '1534'],
'fear': ['1611'],
'anticipation': ['772'],
'disgust': ['772', '1345']}
(*):如果csv包含转义文本或“行内/转义”分隔符之类的内容,则不应自己解析csv。不过,您的数据是普通,您可以自己解析。