我需要写一个字典到csv,但问题是我不能把它保存在内存中,所以我必须迭代:
def save_phons_2_csv(pandas_dataset, csv_name):
if not os.path.isfile(csv_name): #create file if it doesn't exists
with open(csv_name, 'w')as csv_file:
pass
for index_r, row in pandas_dataset.iterrows(): #get all phons frames
for index, phon_dict in enumerate(row['phons']):
if (phon_dict['phon'] not in no_phons):
dicc = get_phonema(row, index)
label = dicc['label']
rows = np.array(dicc["frames"])
with open(csv_name,'a+') as ofile:
... append label and rows to csv
最后,我想要做的是将label
和rows
存储在csv文件中并能够将其读回。
到目前为止,我最好的尝试是:
with open(csv_name,'a+') as ofile:
wr = csv.writer(ofile)
wr.writerow([label, rows])
但它写了一些其中的大部分帧,如下所示:
sh,"[array([ 0.0005188 , 0. , 0.00036621, ..., -0.00024414,
-0.00131226, -0.0015564 ], dtype=float32)]"
ix,"[array([-0.0015564 , -0.00131226, -0.00061035, ..., 0.0017395 ,
0.00012207, -0.00164795], dtype=float32)]"
它还会在\n
放置任何地方。
编辑:声明:
label是一个字符串,如'sh'或'ix'或类似的东西
行是一个数组,如[0.0005188 0. 0.00036621 ..., - 0.00024414 -0.00131226 -0.0015564]
我还有所有帧的最大长度,以防它有用
如果我print(pandas_dataset.head())
这就是我得到的:
Dialect Female ID Male Type \
0 DR1 True CJF0 False SA
1 DR1 True CJF0 False SA
2 DR1 True CJF0 False SI
3 DR1 True CJF0 False SI
4 DR1 True CJF0 False SI
path \
0 C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...
1 C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...
2 C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...
3 C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...
4 C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...
phons \
0 [{'end': 3050, 'start': 0, 'phon': 'h#'}, {'en...
1 [{'end': 2260, 'start': 0, 'phon': 'h#'}, {'en...
2 [{'end': 1513, 'start': 0, 'phon': 'h#'}, {'en...
3 [{'end': 2120, 'start': 0, 'phon': 'h#'}, {'en...
4 [{'end': 1507, 'start': 0, 'phon': 'h#'}, {'en...
words
0 [{'end': 5723, 'start': 3050, 'word': 'she'}, ...
1 [{'end': 4600, 'start': 2260, 'word': 'don't'}...
2 [{'end': 7436, 'start': 1513, 'word': 'even'},...
3 [{'end': 3533, 'start': 2120, 'word': 'or'}, {...
4 [{'end': 2154, 'start': 1507, 'word': 'a'}, {'...
答案 0 :(得分:0)
我终于设法将它保存到csv,但我不认为这是一个很好的解决方案,所以如果有人想出一个更好的答案,我会留下没有标记的答案。
def save_phons_2_csv(pandas_dataset, csv_name):
np.set_printoptions(threshold = np.inf, linewidth = np.inf)
if not os.path.isfile(csv_name): #create file if it doesn't exists
with open(csv_name, 'w')as csv_file:
pass
for index_r, row in pandas_dataset.iterrows(): #get all phons frames
for index, phon_dict in enumerate(row['phons']):
if (phon_dict['phon'] not in no_phons):
dicc = get_phonema(row, index)
label = dicc['label']
rows = dicc["frames"]
with open(csv_name,'a+') as ofile:
text = '%s; %s\n' % (label, rows)
ofile.write(text)
基本上我所做的是设置np打印输出的方式。
这给了我一个这样的csv:
sh [ -3.35693359e-04 3.35693359e-04 -6.71386719e-04 9.46044922e-04...
iy [ 4.94384766e-03 -1.58691406e-03 7.93457031e-04 8.85009766e-04...
...
每行有两个单元格,一个用于标签,一个用于帧,我认为每个帧最好有一个单元格
答案 1 :(得分:0)
您应该能够使用CSV编写器对象来改善问题:
import numpy as np
import pandas as pd
import csv
def save_phons_2_csv(pandas_dataset, csv_name):
np.set_printoptions(threshold = np.inf, linewidth = np.inf)
if not os.path.isfile(csv_name): #create file if it doesn't exists
with open(csv_name, 'w')as csv_file:
pass
with open(csv_name, 'a+', newline='') as ofile:
csv_ofile = csv.writer(ofile)
for index_r, row in pandas_dataset.iterrows(): #get all phons frames
for index, phon_dict in enumerate(row['phons']):
if phon_dict['phon'] not in no_phons:
dicc = get_phonema(row, index)
label = dicc['label']
rows = dicc["frames"]
csv_ofile.writerow([label] + list(rows))
这会获取一个元素列表,并在每个元素之间使用正确的分隔符在输出文件中写入一行。