Question

我需要写一个字典到csv，但问题是我不能把它保存在内存中，所以我必须迭代：

def save_phons_2_csv(pandas_dataset, csv_name):
    if not os.path.isfile(csv_name): #create file if it doesn't exists
        with open(csv_name, 'w')as csv_file:
            pass
    for index_r, row in pandas_dataset.iterrows(): #get all phons frames
        for index, phon_dict in enumerate(row['phons']):
            if (phon_dict['phon'] not in no_phons):
                dicc = get_phonema(row, index)
                label = dicc['label']
                rows = np.array(dicc["frames"])

                with open(csv_name,'a+') as ofile:               
                    ... append label and rows to csv

最后，我想要做的是将label和rows存储在csv文件中并能够将其读回。

到目前为止，我最好的尝试是：

            with open(csv_name,'a+') as ofile:               
                wr = csv.writer(ofile)
                wr.writerow([label, rows])

但它写了一些其中的大部分帧，如下所示：

sh,"[array([ 0.0005188 ,  0.        ,  0.00036621, ..., -0.00024414,
       -0.00131226, -0.0015564 ], dtype=float32)]"

ix,"[array([-0.0015564 , -0.00131226, -0.00061035, ...,  0.0017395 ,
        0.00012207, -0.00164795], dtype=float32)]"

它还会在\n放置任何地方。

编辑：声明：

label是一个字符串，如'sh'或'ix'或类似的东西

行是一个数组，如[0.0005188 0. 0.00036621 ...， - 0.00024414 -0.00131226 -0.0015564]

我还有所有帧的最大长度，以防它有用

如果我print(pandas_dataset.head())这就是我得到的：

 Dialect  Female    ID   Male Type  \
0     DR1    True  CJF0  False   SA   
1     DR1    True  CJF0  False   SA   
2     DR1    True  CJF0  False   SI   
3     DR1    True  CJF0  False   SI   
4     DR1    True  CJF0  False   SI   

                                                path  \
0  C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...   
1  C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...   
2  C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...   
3  C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...   
4  C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...   

                                               phons  \
0  [{'end': 3050, 'start': 0, 'phon': 'h#'}, {'en...   
1  [{'end': 2260, 'start': 0, 'phon': 'h#'}, {'en...   
2  [{'end': 1513, 'start': 0, 'phon': 'h#'}, {'en...   
3  [{'end': 2120, 'start': 0, 'phon': 'h#'}, {'en...   
4  [{'end': 1507, 'start': 0, 'phon': 'h#'}, {'en...   

                                               words  
0  [{'end': 5723, 'start': 3050, 'word': 'she'}, ...  
1  [{'end': 4600, 'start': 2260, 'word': 'don't'}...  
2  [{'end': 7436, 'start': 1513, 'word': 'even'},...  
3  [{'end': 3533, 'start': 2120, 'word': 'or'}, {...  
4  [{'end': 2154, 'start': 1507, 'word': 'a'}, {'...

Answer 1

我终于设法将它保存到csv，但我不认为这是一个很好的解决方案，所以如果有人想出一个更好的答案，我会留下没有标记的答案。

def save_phons_2_csv(pandas_dataset, csv_name):
    np.set_printoptions(threshold = np.inf, linewidth = np.inf)
    if not os.path.isfile(csv_name): #create file if it doesn't exists
        with open(csv_name, 'w')as csv_file:
            pass

    for index_r, row in pandas_dataset.iterrows(): #get all phons frames
        for index, phon_dict in enumerate(row['phons']):
            if (phon_dict['phon'] not in no_phons):
                dicc = get_phonema(row, index)
                label = dicc['label']
                rows = dicc["frames"]
                with open(csv_name,'a+') as ofile: 
                    text = '%s; %s\n' % (label, rows)
                    ofile.write(text)

基本上我所做的是设置np打印输出的方式。

这给了我一个这样的csv：

sh   [ -3.35693359e-04   3.35693359e-04  -6.71386719e-04   9.46044922e-04...
iy   [  4.94384766e-03  -1.58691406e-03   7.93457031e-04   8.85009766e-04...
...

每行有两个单元格，一个用于标签，一个用于帧，我认为每个帧最好有一个单元格

Answer 2

您应该能够使用CSV编写器对象来改善问题：

import numpy as np
import pandas as pd
import csv


def save_phons_2_csv(pandas_dataset, csv_name):
    np.set_printoptions(threshold = np.inf, linewidth = np.inf)
    if not os.path.isfile(csv_name): #create file if it doesn't exists
        with open(csv_name, 'w')as csv_file:
            pass

    with open(csv_name, 'a+', newline='') as ofile:             
        csv_ofile = csv.writer(ofile)

        for index_r, row in pandas_dataset.iterrows(): #get all phons frames
            for index, phon_dict in enumerate(row['phons']):
                if phon_dict['phon'] not in no_phons:
                    dicc = get_phonema(row, index)
                    label = dicc['label']
                    rows = dicc["frames"]
                    csv_ofile.writerow([label] + list(rows))

这会获取一个元素列表，并在每个元素之间使用正确的分隔符在输出文件中写入一行。

将包含行的字典写入csv迭代

2 个答案: