我下面有一个标签列表。
mytags = ["a", "b", "c", "d", "e", "f"]
而且,我有一个文件格式为列表的字符串。
['a-1', 'b-3', 'c-4', 'e-3']
['a-10', 'b-12', 'c-14', 'd-16']
['b-1', 'c-5', 'd-13', 'f-7']
我想按mylist中的标签顺序将文件打印为制表符分隔的表格。
#header
#a, b, c, d, e, f
a-1 b-3 c-4 NA e-3 NA
a-10 b-12 c-14 d-16 NA NA
NA b-1 c-5 d-13 NA f-7
我写了一个python代码,但是嵌套的double循环给出了不想要的结果。
print (mylist)
for lineList in file:
for tag in mytags:
if tag in lineList:
print(lineList, end="\t")
else:
print("NA", end="\t")
如何用这些数据制作表格?
答案 0 :(得分:3)
在与标记列表进行比较之前,应从项目中提取标记:
mytags = ["a", "b", "c", "d", "e", "f"]
rows = [
['a-1', 'b-3', 'c-4', 'e-3'],
['a-10', 'b-12', 'c-14', 'd-16'],
['b-1', 'c-5', 'd-13', 'f-7']
]
for row in rows:
for tag in mytags:
print(row.pop(0) if row and row[0].split('-')[0] == tag else 'NA', end='\t')
print()
或带有生成器表达式:
print('\n'.join('\t'.join(row.pop(0) if row and row[0].split('-')[0] == tag else 'NA' for tag in mytags) for row in rows))
答案 1 :(得分:1)
因为字符串将存储在文件中,所以下面是我的方法
# read the file
data = pd.read_csv('test.txt', header=None,sep='[')
master_df = pd.DataFrame(columns=['a','b','c','d','e','f'])
for i in range(len(data)):
master_df.loc[i] = 'NA'
temp = data[1][i].replace(']','')
temp = temp.replace("'",'')
for char in temp.split(','):
master_df[char.split('-')[0].strip()][i] = char
print(master_df)
输出
a b c d e f
0 a-1 b-3 c-4 NA e-3 NA
1 a-10 b-12 c-14 d-16 NA NA
2 NA b-1 c-5 d-13 NA f-7
答案 2 :(得分:1)
使用re
(正则表达式)来完成您描述的操作,这是一种易于理解的方法,但是您应该只获取文件的文本,而无需像csv_reader
这样的特殊读物或其他任何内容,因此只需使用open
函数读取文件,就可以开始:-
import re
filetext = """['a-1', 'b-3', 'c-4', 'e-3']
['a-10', 'b-12', 'c-14', 'd-16']
['b-1', 'c-5', 'd-13', 'f-7']"""
#find all values
values = re.findall(r'\w+-\d+', filetext)
values.sort()
#find tags
tags = []
for i in values:
if(tags.count(i.split('-')[0])==0):
tags.append(i.split('-')[0])
#find max length
maxLength = max([len(list(filter(lambda a:a.split('-')[0]==i, values))) for i in tags])
#create a list with the results
result = [[] for i in tags]
ind=-1
for i in tags:
ind+=1
for j in values:
if(j.split('-')[0]==i):
result[ind].append(j)
#add 'NA' for non complete lists
for i in result:
i.sort(key=lambda v:int(v.split('-')[1]))
if(len(i)!=maxLength):
for j in range(maxLength - len(i)):
i.append('NA')
#print them as you liked
for i in tags:
print(i, end='\t')
print()
for i in range(maxLength):
for j in result:
print(j[i], end='\t')
print()
结果
a b c d e f
a-1 b-1 c-4 d-13 e-3 f-7
a-10 b-3 c-5 d-16 NA NA
NA b-12 c-14 NA NA NA
答案 3 :(得分:1)
可以在此处使用setdefault
my_tags = ["a", "b", "c", "d", "e", "f"]
line_list = [
['a-1', 'b-3', 'c-4', 'e-3'],
['a-10', 'b-12', 'c-14', 'd-16'],
['b-1', 'c-5', 'd-13', 'f-7']
]
for lst in line_list:
d = {i[0]: i for i in lst}
for i in my_tags:
print(d.setdefault(i, 'NA'), end ='\t')
print()
a-1 b-3 c-4 NA e-3 NA
a-10 b-12 c-14 d-16 NA NA
NA b-1 c-5 d-13 NA f-7