我正在运行python v2.7。
我有defaultdict(int)
我从中提取键和值,然后使用字符串格式写入输出文件。我写入文件的代码如下所示:
output_line = '{}\t{}\t{}\t{}\t{}\n'.format(a, b, c, d, e)
output_file.write(output_line)
a,b,c等是来自此defaultdict(int)
的值,我们称之为old_dict
。我在for
循环中为old_dict
中的每个键写入文件,到目前为止我对输出感到满意;它基本上给了我一个表,每个列由制表符分隔(一个制表符描述的文件,我可以在Excel中打开)。
我遇到的问题是我根据第一个defaultdict(int)
创建了另一个词典,我想在之间输出该词典的key: value
对列。踢球者是因为key: value
对要打印垂直,而不是横向打印(因为这第二本字典可能很大,如果我是横向编写的,那么我就是必须滚动真的,真的很远,看到每个key: value
!)
示例代码:
old_dict = defaultdict(int)
new_dict = old_dict[same_key] # Lookup "same_key" in old_dict, get all associated nested matching key: values, and store in "new_dict"
nicer_format = ", ".join("{}: {}".format(k, v) for k, v in new_dict.items()) # Clean up the format a bit for writing to file.
现在我将output_line
更改为:
output_line = '{}\t{}\t{}\t{}\t{}\t{}\n'.format(a, b, c, nicer_format, d, e)
它有效,但我得到一个水平列表(即nicer_format是水平的)。输出看起来像: Undesired Output
我希望看到的是列标题4下的内容是垂直显示的: Desired output
我已尝试根据我在“填充和对齐字符串”部分here下阅读的内容,在join
变量下对nicer_format
语句进行字符串格式设置。像
nicer_format = ", ".join("{}: {}{":\t>3"}".format(k, v) for k, v in new_dict.items())
因为我想用三个标签和一个新行分隔每个新值。但是,这失败了。
我也试过玩熊猫,并使用这行代码:
import pandas as pd
test_panda = pd.DataFrame.from_dict(new_dict, orient="index")
我不确定orient="index"
应该是什么(我刚刚开始搞乱大熊猫,并且没有阅读有关此参数的任何文档),但我得到以下输出:
它很接近,因为现在输出是垂直的,但它不在右列之下!有没有办法让输出在列标题4下?我甚至需要大熊猫吗?上面的字符串填充/格式化尝试出了什么问题?
编辑:我尝试从头开始创建我的MCV代码,但是在尝试重建我的字典时遇到错误,我不知道如何解决它。我认为这是因为在我的真实代码中,我通过阅读2个文件defaultdict(int)
来构建我的词典,并且它工作正常。如果需要,我可以附加这些文件,但在此之前,这是我从头开始构建的MCV代码,试图说明更多细节。
from __future__ import print_function
from collections import defaultdict
import pandas as pd
dict_one, dict_intermediate = defaultdict(lambda: defaultdict(int)), defaultdict(lambda: defaultdict(int))
# This is where my dictionaries get messed up. Normally I iterate through the file(s) and build them as defaultdict(int).
# But I don't know how to change that here, so I just manually wrote out here what the keys and values should be.
# The values are the (int) part; it's a set of that keeps track how many times each string appears.
# value_one and value_two are the final values after I finish reading the files and have the dictionaries completed.
key_one = "ACGACGGGCACT\tGAGCACCAGGAGCCGCGTGCCTGGCCCGAAGTACTGGGTCTCTTGAAAGCCCCCGCTATTGCTGCTGGCACAGAAGTACACAGCTGAGTCCCTGGGTTCT\tCASSNSGGFQETQYF\t8\t9" # UMI with other extra info
value_one = "{'B670': 1, 'B180': 1, 'B240': 1, 'B360': 1, 'B880': 1, 'B210': 1, 'B230': 1, 'B500': 1, 'B480': 1}" # Batch number: count
key_two = "ACGACGGGCACT" # This is the UMI.
value_two = "{CTGGGGTGACCCCCCCAAGAACTGATCATAACGTACTCTGCGTTGATACCACTAAGGCTGGAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCT: 1," \
"CTGGGGTGACCCCCCCAAGAACTGATCATAACGTACTCTGCGTTGATACCACTGAGGCTGGAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCT: 1," \
"CTGGGGTGACCCCCCCAAGAACTGATCATAACGTACTCTGCGTTGATACCACTGAGGCTGGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCT: 1," \
"CTGGGGTGACTCCCCCAAGAACTGATCATAACGAACTCTGCGTTGATACCACTGAGGCTGGAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCT: 1}" # Sequence: count
dict_one[key_one] += value_one
dict_intermediate[key_two] += value_two
def split_tabs(x):
"""
Function to split tab-separated strings. It's used to break up the keys and values
into their individual components.
"""
return x.split('\t')
for k in dict_one:
umi = split_tabs(k)[0] # Extract the UMI from the key.
overlap_reads = int(split_tabs(k)[4]) # Extract the reads from the key.
dict_two = dict_intermediate[umi] # Lookup the matching UMI in "dict_intermediate" & get all sequences + their counts in "dict_two".
source_sequences = ", ".join("{}: {}".format(a, b) for a, b in dict_two.items()) # Output all sequences + their counts associated with that UMI (format as "sequence: count").
panda_test = pd.DataFrame.from_dict(dict_one, orient="index")
batch_set = ", ".join("{}: {}".format(a, b) for a, b in dict_one[key_one].items())
total_counts = sum(dict_two.values()) # Sum of counts for all sequences for a single UMI.
earliest_batch = min(dict_one[k].keys()) # The smallest batch (B) number.
output_line = '{}\t{}\t{}\t{}\t{}\n'.format(k, panda_test, batch_set, total_counts, earliest_batch)
答案 0 :(得分:0)
您不希望的输出来自
def output(new_dict, a, b, c, d, e):
nicer_format = ", ".join("{}: {}".format(k, v) for k, v in new_dict.items())
return '{}\t{}\t{}\t{}\t{}\t{}\n'.format(a, b, c, nicer_format, d, e)
要获得所需的输出,
def output(new_dict, a, b, c, d, e):
output_lines = ''
first = True
for k, v in new_dict.items():
if first:
output_lines += '{}\t{}\t{}\t{}: {}\t{}\t{}\n'.format(a, b, c, k, v, d, e)
first = False
else:
output_lines += '\t\t\t{}: {}\t\t\n'.format(k, v)
return output_lines
现在output_lines
将有多行,就像您想要的输出一样。