如何将文件结构改为表格格式?

时间:2015-12-10 00:37:40

标签: python csv formatting format tabular

我有一个包含以下数据的文件:

输入:

Query= A1 bird
Hit= B1 owl
Score= 1.0 4.0 2.5
Hit= B2 bluejay
Score= 10.0 6.0 7.0
Query= A2 shark
Hit= C1 catshark
Score= 10.0 7.0 2.0
Query= A3 cat
Hit= D1 dog
Score= 7.0 2.0 1.0

我想编写一个程序来操作数据结构,使其以表格(.csv)格式生成......如下所示:

输出:

Query = A1 bird, Hit= B1 owl, Score= 1.0 4.0 2.5 #The first query, hit, score 
Query = A1 bird, Hit= B2 bluejay, Score= 10.0 6.0 7.0 #The second hit and score associated with the first query
Query = A2 shark, Hit= C1 catshark, Score= 10.0 7.0 2.0 #The second query, hit, socre
Query = A3 cat, Hit= D1 dog, Score= 7.0 2.0 1.0 #The third query, hit, score

我试图通过Takis执行以下建议的解决方案:

with open('g.txt', 'r') as f, open('result.csv', 'w') as csvfile:
fieldnames = ['Query', 'Hit', 'Score']
csvwriter = csv.DictWriter(csvfile, quoting=csv.QUOTE_ALL, 
                           fieldnames=fieldnames)
csvwriter.writeheader()
data = {}
for line in f:
    key, value = line.split('=')
    data[key.strip()] = value.strip()
    if len(data.keys()) == 3:
        csvwriter.writerow(data)
        data = {}

问题: 如何使程序识别与每个查询关联的命中和分数,以便我可以在一行中打印它们?如果一个查询在其下有多个匹配和分数(与之关联),则打印查询,第二个命中和第二个分数。完全像以下输出:

"A1 bird","B1 owl","1.0 4.0 2.5" #1st Query, its 1st Hit, its 1st Score
"A1 bird","B2 bluejay", "10.0 6.0 7.0" #1st Query, its 2nd Hit, its 2nd Score
"A2 shark","C1 catshark", "10.0 7.0 2.0"#2nd Query, 1st and only Hit, 1st and only Score
"A3 cat","D1 dog","7.0 2.0 1.0"#3d Query, 1st and only Hit, 1st and only Score  

有什么想法吗?

2 个答案:

答案 0 :(得分:1)

更改最后一行

print line.rstrip("\n\r"), #print of the first score

print line.rstrip("\n\r") #print of the first score

(删除最后一个逗号)。

如果要重复上一个查询,则需要添加一些变量:

query = None
prev_query = None

for line in file:
   if line.startswith("Query="):
      query = line.rstrip("\n\r")
      print query, #print of the query line
   elif line.startswith("Hit="):
      if not query:
          print prev_query,
      print line.rstrip("\n\r"), #print of the first hit
   elif line.startswith("Score="):
      print line.rstrip("\n\r") #print of the first score
      prev_query = query
      query = None

答案 1 :(得分:1)

I would use the DictWriter class in the csv package to write the parsed data to CSV. There's no error handling, the program assumes the three needed data items will appear for each query, although they do not need to given in the same order for each query.

import csv

with open('g.txt', 'r') as f, open('result.csv', 'w') as csvfile:
    fieldnames = ['Query', 'Hit', 'Score']
    csvwriter = csv.DictWriter(csvfile, quoting=csv.QUOTE_ALL, 
                               fieldnames=fieldnames)
    csvwriter.writeheader()
    data = {}
    for line in f:
        key, value = line.split('=')
        data[key.strip()] = value.strip()
        if len(data.keys()) == 3:
            csvwriter.writerow(data)
            data = {}