如何将txt文件中的两列值视为键,值并将它们写入txt文件?

时间:2018-10-09 18:11:43

标签: python python-3.x dictionary

我有一个带有制表符分隔符的文本文件,如下所示:

id name age sex Basis Salary 2345 john 23 M Monthly 6000 2345 john 23 M Yearly 72000 4356 mary 26 F Perday 225 4356 mary 26 F Monthly 7000

以id为键,我需要将Basis和Salary列值组合为结果文件中的列,如下所示。

注意:如果“每日”,“每月”或“每年”没有任何值,则应将其指定为“''”。

id Name age sex PerDay Monthly Yearly 2345 john 23 M ' ' 6000 72000 4356 mary 26 F 225 7000 ' '

我们如何以python方式做到这一点?

3 个答案:

答案 0 :(得分:0)

mypath = '/path/to/file.csv'

with open(mypath) as fh:
    lines = fh.readlines()

header, body = lines[0], lines[0:]

records = {}

for record in body:
    id, name, age, sex, basis, salary = record.split('\t')
    cached = records.get(id)

    if cached:
        cached[basis] = salary
        records[id] = cached

    else:
        records[id] = {'id': id, "name": name, "age": age, "sex": sex, basis: salary, **{base: ' ' for base in 
                                                                                         set(['Yearly', 'Monthly', 'Perday'])-{basis}}}

简要说明:

mypath是您的.csv文件的路径

我剥离标题,然后将所有记录作为字符串列表获取。接下来,我们遍历该列表

\t或制表符分隔每行,然后解压缩为原始结构

在原始id上进行查找。如果已经处理过,我们只想添加一个带有相关薪水的basis条目。如果还没有,那么我们将添加一条包含所有内容的记录,并根据要求使用提供的basissalary来解压缩各个' '

答案 1 :(得分:0)

import re
# read each line in your code
input_file = open('filePath',r)
output_file = open('outfile.txt', 'w')
output_file.write('id     Name    age   sex   PerDay  Monthly Yearly\n')
for line in input_file.readlines()[1:]: # excluding the first line
  m = re.search("(\d+)\s+([A-Za-z]+)\s+(\d+)\s+([MmFf])\s+([A-Za-z]+)\s+(\d+)",line)
  # >>>m
  # >>><_sre.SRE_Match object; span=(0, 41), match='2345    john    23    M     Monthly  6000'>
  if m:
     # >>>m.groups()
     # >>>('Monthly', '6000')
     # based on the montly and perday, multiply the second value and place in your output file
     # based on m.group(5) - leave others as " "
     # if monthly 
     # if early 
     # if daily 

     output_file.write("write your individual outputs" )

答案 2 :(得分:0)

我认为类似的方法效果最好。不过,它假定ID号是唯一的。

import csv

id_column = 0
melt_column = 4
value_column = 5
in_file = "file.csv"
out_file = "out.csv"

new_headers = ['id','Name','age','sex','PerDay','Monthly','Yearly']
header = None
data = dict()


with open(in_file) as csvfile:
    for row in csv.reader(csvfile, delimiter="\t"):
        if header is None:
            header = row
            continue
        else:
            melt_idx = new_headers.index(row[melt_column])
            if row[id_column] not in data:
                data[row[id_column]] = row[id_column:melt_column] + ["", "", ""]
            data[row[id_column]][melt_idx] = row[value_column]

with open(out_file, mode="w") as csvfile:
    writer = csv.writer(csvfile, delimiter="\t")
    writer.writerow(new_headers)
    for k, val in data.items():
        writer.writerow(val)