Question

我正在运行python v2.7。

我有defaultdict(int)我从中提取键和值，然后使用字符串格式写入输出文件。我写入文件的代码如下所示：

output_line = '{}\t{}\t{}\t{}\t{}\n'.format(a, b, c, d, e)
output_file.write(output_line)

a，b，c等是来自此defaultdict(int)的值，我们称之为old_dict。我在for循环中为old_dict中的每个键写入文件，到目前为止我对输出感到满意;它基本上给了我一个表，每个列由制表符分隔（一个制表符描述的文件，我可以在Excel中打开）。

我遇到的问题是我根据第一个defaultdict(int)创建了另一个词典，我想在之间输出该词典的key: value对列。踢球者是因为key: value对要打印垂直，而不是横向打印（因为这第二本字典可能很大，如果我是横向编写的，那么我就是必须滚动真的，真的很远，看到每个key: value！）

示例代码：

old_dict = defaultdict(int) new_dict = old_dict[same_key] # Lookup "same_key" in old_dict, get all associated nested matching key: values, and store in "new_dict" nicer_format = ", ".join("{}: {}".format(k, v) for k, v in new_dict.items()) # Clean up the format a bit for writing to file.

现在我将output_line更改为：

output_line = '{}\t{}\t{}\t{}\t{}\t{}\n'.format(a, b, c, nicer_format, d, e)

它有效，但我得到一个水平列表（即nicer_format是水平的）。输出看起来像： Undesired Output

我希望看到的是列标题4下的内容是垂直显示的： Desired output

我已尝试根据我在“填充和对齐字符串”部分here下阅读的内容，在join变量下对nicer_format语句进行字符串格式设置。像
这样的东西
nicer_format = ", ".join("{}: {}{":\t>3"}".format(k, v) for k, v in new_dict.items())

因为我想用三个标签和一个新行分隔每个新值。但是，这失败了。

我也试过玩熊猫，并使用这行代码：

import pandas as pd test_panda = pd.DataFrame.from_dict(new_dict, orient="index")

我不确定orient="index"应该是什么（我刚刚开始搞乱大熊猫，并且没有阅读有关此参数的任何文档），但我得到以下输出：

Output after using pandas

它很接近，因为现在输出是垂直的，但它不在右列之下！有没有办法让输出在列标题4下？我甚至需要大熊猫吗？上面的字符串填充/格式化尝试出了什么问题？
编辑：我尝试从头开始创建我的MCV代码，但是在尝试重建我的字典时遇到错误，我不知道如何解决它。我认为这是因为在我的真实代码中，我通过阅读2个文件defaultdict(int)来构建我的词典，并且它工作正常。如果需要，我可以附加这些文件，但在此之前，这是我从头开始构建的MCV代码，试图说明更多细节。

from __future__ import print_function from collections import defaultdict import pandas as pd dict_one, dict_intermediate = defaultdict(lambda: defaultdict(int)), defaultdict(lambda: defaultdict(int)) # This is where my dictionaries get messed up. Normally I iterate through the file(s) and build them as defaultdict(int). # But I don't know how to change that here, so I just manually wrote out here what the keys and values should be. # The values are the (int) part; it's a set of that keeps track how many times each string appears. # value_one and value_two are the final values after I finish reading the files and have the dictionaries completed. key_one = "ACGACGGGCACT\tGAGCACCAGGAGCCGCGTGCCTGGCCCGAAGTACTGGGTCTCTTGAAAGCCCCCGCTATTGCTGCTGGCACAGAAGTACACAGCTGAGTCCCTGGGTTCT\tCASSNSGGFQETQYF\t8\t9" # UMI with other extra info value_one = "{'B670': 1, 'B180': 1, 'B240': 1, 'B360': 1, 'B880': 1, 'B210': 1, 'B230': 1, 'B500': 1, 'B480': 1}" # Batch number: count key_two = "ACGACGGGCACT" # This is the UMI. value_two = "{CTGGGGTGACCCCCCCAAGAACTGATCATAACGTACTCTGCGTTGATACCACTAAGGCTGGAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCT: 1," \ "CTGGGGTGACCCCCCCAAGAACTGATCATAACGTACTCTGCGTTGATACCACTGAGGCTGGAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCT: 1," \ "CTGGGGTGACCCCCCCAAGAACTGATCATAACGTACTCTGCGTTGATACCACTGAGGCTGGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCT: 1," \ "CTGGGGTGACTCCCCCAAGAACTGATCATAACGAACTCTGCGTTGATACCACTGAGGCTGGAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCT: 1}" # Sequence: count dict_one[key_one] += value_one dict_intermediate[key_two] += value_two def split_tabs(x): """ Function to split tab-separated strings. It's used to break up the keys and values into their individual components. """ return x.split('\t') for k in dict_one: umi = split_tabs(k)[0] # Extract the UMI from the key. overlap_reads = int(split_tabs(k)[4]) # Extract the reads from the key. dict_two = dict_intermediate[umi] # Lookup the matching UMI in "dict_intermediate" & get all sequences + their counts in "dict_two". source_sequences = ", ".join("{}: {}".format(a, b) for a, b in dict_two.items()) # Output all sequences + their counts associated with that UMI (format as "sequence: count"). panda_test = pd.DataFrame.from_dict(dict_one, orient="index") batch_set = ", ".join("{}: {}".format(a, b) for a, b in dict_one[key_one].items()) total_counts = sum(dict_two.values()) # Sum of counts for all sequences for a single UMI. earliest_batch = min(dict_one[k].keys()) # The smallest batch (B) number. output_line = '{}\t{}\t{}\t{}\t{}\n'.format(k, panda_test, batch_set, total_counts, earliest_batch)

Answer 1

您不希望的输出来自

def output(new_dict, a, b, c, d, e):
    nicer_format = ", ".join("{}: {}".format(k, v) for k, v in new_dict.items())
    return '{}\t{}\t{}\t{}\t{}\t{}\n'.format(a, b, c, nicer_format, d, e)

要获得所需的输出，

def output(new_dict, a, b, c, d, e):
    output_lines = ''
    first = True
    for k, v in new_dict.items():
        if first:
            output_lines += '{}\t{}\t{}\t{}: {}\t{}\t{}\n'.format(a, b, c, k, v, d, e)
            first = False
        else:
            output_lines += '\t\t\t{}: {}\t\t\n'.format(k, v)
    return output_lines

现在output_lines将有多行，就像您想要的输出一样。

格式字典键：新行和新标签上的值

1 个答案: