Question

我尝试用数据分析文本文件 - 列和记录。我的档案：

Name     Surname    Age    Sex      Grade
Chris      M.        14     M       4
Adam       A.        17     M
Jack       O.               M       8

文本文件包含一些空数据。如上。用户想要显示姓名和成绩：

import csv

with open('launchlog.txt', 'r') as in_file:
    stripped = (line.strip() for line in in_file)
    lines = (line.split() for line in stripped if line)
    with open('log.txt', 'w') as out_file:
        writer = csv.writer(out_file)
        writer.writerow(('Name', 'Surname', 'Age', 'Sex', 'Grade'))
        writer.writerows(lines)

log.txt：

Chris,M.,14,M,4
Adam,A.,17,M
Jack,O.,M,8

如何清空数据插入“无”字符串？例如：

Chris,M.,14,M,4
Adam,A.,17,M,None
Jack,O.,None,M,8

在Python中执行此操作的最佳方法是什么？

Answer 1

使用pandas：

import pandas
data=pandas.read_fwf("file.txt")

获取你的词典：

data.set_index("Name")["Grade"].to_dict()

Answer 2

Pure Python™中的内容似乎可以满足您的需求，至少在您问题中的示例数据文件中。

简而言之，它的作用是首先确定列标题行中每个字段名称的开始和结束位置，然后对于文件的每个剩余行，确实获得第二个列表用于确定行中每个数据项的哪一列（然后将其放入将写入输出文件的行中的正确位置）。

<form name="formprofile" method="POST" action="">
{% csrf_token %}

<p id="profile_timezone" class="form-inline">
{{ profile_edit_form.profile_timezone.errors }}
Timezone:
{{ profile_edit_form.profile_timezone }}
</p>

<button id="id_btn_profile_edit_save" type="submit" class="btn btn-default" tabindex=7>Save</button>
</form>

以下是它创建的import csv def find_words(line): """ Return a list of (start, stop) tuples with the indices of the first and last characters of each "word" in the given string. Any sequence of consecutive non-space characters is considered as comprising a word. """ line_len = len(line) indices = [] i = 0 while i < line_len: start, count = i, 0 while line[i] != ' ': count += 1 i += 1 if i >= line_len: break indices.append((start, start+count-1)) while i < line_len and line[i] == ' ': # advance to start of next word i += 1 return indices # convert text file with missing fields to csv with open('name_grades.txt', 'rt') as in_file, open('log.csv', 'wt', newline='') as out_file: writer = csv.writer(out_file) header = next(in_file) # read first line fields = header.split() writer.writerow(fields) # determine the indices of where each field starts and stops based on header line field_positions = find_words(header) for line in in_file: line = line.rstrip('\r\n') # remove trailing newline row = ['None' for _ in range(len(fields))] value_positions = find_words(line) for (vstart, vstop) in value_positions: # determine what field the value is underneath for i, (hstart, hstop) in enumerate(field_positions): if vstart <= hstop and hstart <= vstop: # overlap? row[i] = line[vstart:vstop+1] break # stop looking writer.writerow(row)文件的内容：

log.csv

Answer 3

我会使用baloo的答案而不是我的 - 但如果您只是想了解代码出错的地方，下面的解决方案大部分都有效（等级字段存在格式问题，但我确定您可以通过它。）添加一些打印语句到您的代码和我的，你应该能够找到差异。

导入csv

<Old Code removed in favor of new code below>

编辑：我现在看到你的困难。请尝试以下代码;我今天没时间了，所以你必须填写print语句所在的编写器部分，但这将满足你用None替换空字段的请求。

import csv

with open('Test.txt', 'r') as in_file:
    with open('log.csv', 'w') as out_file:
        writer = csv.writer(out_file)
        lines = [line for line in in_file]
        name_and_grade = dict()
        for line in lines[1:]:
            parts = line[0:10], line[11:19], line[20:24], line[25:31], line[32:]
            new_line = list()
            for part in parts:
                val = part.replace('/n','')
                val = val.strip()
                val = val if val != '' else 'None'
                new_line.append(val)
            print(new_line)

Answer 4

不使用熊猫：

根据您的评论进行编辑，我根据您的数据对此解决方案进行了硬编码。这不适用于没有Surname列的行。
我写出Name和Grade，因为您只需要这两列。

o = open("out.txt", 'w')
with open("inFIle.txt") as f:
    for lines in f:
        lines = lines.strip("\n").split(",")
        try:
            grade = int(lines[-1])
            if (lines[-2][-1]) != '.':
                o.write(lines[0]+","+ str(grade)+"\n")
        except ValueError:
            print(lines)
o.close()

如何将此文本文件转换为csv？

4 个答案: