Question

我有一个带有制表符分隔符的文本文件，如下所示：

id name age sex Basis Salary 2345 john 23 M Monthly 6000 2345 john 23 M Yearly 72000 4356 mary 26 F Perday 225 4356 mary 26 F Monthly 7000

以id为键，我需要将Basis和Salary列值组合为结果文件中的列，如下所示。

注意：如果“每日”，“每月”或“每年”没有任何值，则应将其指定为“''”。

id Name age sex PerDay Monthly Yearly 2345 john 23 M ' ' 6000 72000 4356 mary 26 F 225 7000 ' '

我们如何以python方式做到这一点？

Answer 1

mypath = '/path/to/file.csv'

with open(mypath) as fh:
    lines = fh.readlines()

header, body = lines[0], lines[0:]

records = {}

for record in body:
    id, name, age, sex, basis, salary = record.split('\t')
    cached = records.get(id)

    if cached:
        cached[basis] = salary
        records[id] = cached

    else:
        records[id] = {'id': id, "name": name, "age": age, "sex": sex, basis: salary, **{base: ' ' for base in 
                                                                                         set(['Yearly', 'Monthly', 'Perday'])-{basis}}}

简要说明：

mypath是您的.csv文件的路径

我剥离标题，然后将所有记录作为字符串列表获取。接下来，我们遍历该列表

用\t或制表符分隔每行，然后解压缩为原始结构

在原始id上进行查找。如果已经处理过，我们只想添加一个带有相关薪水的basis条目。如果还没有，那么我们将添加一条包含所有内容的记录，并根据要求使用提供的basis或salary来解压缩各个' '值

Answer 2

import re
# read each line in your code
input_file = open('filePath',r)
output_file = open('outfile.txt', 'w')
output_file.write('id     Name    age   sex   PerDay  Monthly Yearly\n')
for line in input_file.readlines()[1:]: # excluding the first line
  m = re.search("(\d+)\s+([A-Za-z]+)\s+(\d+)\s+([MmFf])\s+([A-Za-z]+)\s+(\d+)",line)
  # >>>m
  # >>><_sre.SRE_Match object; span=(0, 41), match='2345    john    23    M     Monthly  6000'>
  if m:
     # >>>m.groups()
     # >>>('Monthly', '6000')
     # based on the montly and perday, multiply the second value and place in your output file
     # based on m.group(5) - leave others as " "
     # if monthly 
     # if early 
     # if daily 

     output_file.write("write your individual outputs" )

Answer 3

我认为类似的方法效果最好。不过，它假定ID号是唯一的。

import csv

id_column = 0
melt_column = 4
value_column = 5
in_file = "file.csv"
out_file = "out.csv"

new_headers = ['id','Name','age','sex','PerDay','Monthly','Yearly']
header = None
data = dict()


with open(in_file) as csvfile:
    for row in csv.reader(csvfile, delimiter="\t"):
        if header is None:
            header = row
            continue
        else:
            melt_idx = new_headers.index(row[melt_column])
            if row[id_column] not in data:
                data[row[id_column]] = row[id_column:melt_column] + ["", "", ""]
            data[row[id_column]][melt_idx] = row[value_column]

with open(out_file, mode="w") as csvfile:
    writer = csv.writer(csvfile, delimiter="\t")
    writer.writerow(new_headers)
    for k, val in data.items():
        writer.writerow(val)

如何将txt文件中的两列值视为键，值并将它们写入txt文件？

3 个答案: