Question

我下面有一个标签列表。

mytags = ["a", "b", "c", "d", "e", "f"]

而且，我有一个文件格式为列表的字符串。

['a-1',   'b-3',  'c-4',  'e-3']
['a-10', 'b-12', 'c-14', 'd-16']
['b-1',   'c-5', 'd-13',  'f-7']

我想按mylist中的标签顺序将文件打印为制表符分隔的表格。

#header
#a,   b,   c,   d,   e,  f
 a-1  b-3  c-4  NA   e-3 NA
 a-10 b-12 c-14 d-16 NA  NA
 NA   b-1  c-5  d-13 NA  f-7

我写了一个python代码，但是嵌套的double循环给出了不想要的结果。

print (mylist)

for lineList in file:
    for tag in mytags:
        if tag in lineList:
            print(lineList, end="\t")
        else:
            print("NA", end="\t")

如何用这些数据制作表格？

Answer 1

在与标记列表进行比较之前，应从项目中提取标记：

mytags = ["a", "b", "c", "d", "e", "f"]
rows = [
    ['a-1',   'b-3',  'c-4',  'e-3'],
    ['a-10', 'b-12', 'c-14', 'd-16'],
    ['b-1',   'c-5', 'd-13',  'f-7']
]
for row in rows:
    for tag in mytags:
        print(row.pop(0) if row and row[0].split('-')[0] == tag else 'NA', end='\t')
    print()

或带有生成器表达式：

print('\n'.join('\t'.join(row.pop(0) if row and row[0].split('-')[0] == tag else 'NA' for tag in mytags) for row in rows))

Answer 2

因为字符串将存储在文件中，所以下面是我的方法

# read the file
data = pd.read_csv('test.txt', header=None,sep='[')

master_df = pd.DataFrame(columns=['a','b','c','d','e','f'])

for i in range(len(data)):
    master_df.loc[i] = 'NA'
    temp = data[1][i].replace(']','')
    temp = temp.replace("'",'')
    for char in temp.split(','):
        master_df[char.split('-')[0].strip()][i] = char

print(master_df)

输出

      a       b       c      d      e      f
0   a-1     b-3     c-4     NA    e-3     NA
1  a-10    b-12    c-14   d-16     NA     NA
2    NA     b-1     c-5   d-13     NA    f-7

Answer 3

使用re（正则表达式）来完成您描述的操作，这是一种易于理解的方法，但是您应该只获取文件的文本，而无需像csv_reader这样的特殊读物或其他任何内容，因此只需使用open函数读取文件，就可以开始：-

import re

filetext = """['a-1',   'b-3',  'c-4',  'e-3']
['a-10', 'b-12', 'c-14', 'd-16']
['b-1',   'c-5', 'd-13',  'f-7']"""

#find all values
values = re.findall(r'\w+-\d+', filetext)
values.sort()

#find tags
tags = []
for i in values:
    if(tags.count(i.split('-')[0])==0):
        tags.append(i.split('-')[0])

#find max length
maxLength = max([len(list(filter(lambda a:a.split('-')[0]==i, values))) for i in tags])

#create a list with the results
result = [[] for i in tags]
ind=-1
for i in tags:
    ind+=1
    for j in values:
        if(j.split('-')[0]==i):
            result[ind].append(j)

#add 'NA' for non complete lists
for i in result:
    i.sort(key=lambda v:int(v.split('-')[1]))
    if(len(i)!=maxLength):
        for j in range(maxLength - len(i)):
            i.append('NA')

#print them as you liked
for i in tags:
    print(i, end='\t')

print()

for i in range(maxLength):
    for j in result:
        print(j[i], end='\t')
    print()

结果

a      b      c      d       e      f    
a-1    b-1    c-4    d-13    e-3    f-7    
a-10   b-3    c-5    d-16    NA     NA   
NA     b-12   c-14   NA      NA     NA

Answer 4

可以在此处使用setdefault

my_tags = ["a", "b", "c", "d", "e", "f"]
line_list = [
    ['a-1',   'b-3',  'c-4',  'e-3'],
    ['a-10', 'b-12', 'c-14', 'd-16'],
    ['b-1',   'c-5', 'd-13',  'f-7']
]

for lst in line_list:
    d = {i[0]: i for i in lst}
    for i in my_tags:
        print(d.setdefault(i, 'NA'), end ='\t')
    print()


a-1     b-3     c-4     NA      e-3     NA  
a-10    b-12    c-14    d-16    NA      NA  
NA      b-1     c-5     d-13    NA      f-7

Python使用指南列表通过循环格式化表格

4 个答案: