我尝试用数据分析文本文件 - 列和记录。 我的档案:
Name Surname Age Sex Grade
Chris M. 14 M 4
Adam A. 17 M
Jack O. M 8
文本文件包含一些空数据。如上。 用户想要显示姓名和成绩:
import csv
with open('launchlog.txt', 'r') as in_file:
stripped = (line.strip() for line in in_file)
lines = (line.split() for line in stripped if line)
with open('log.txt', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('Name', 'Surname', 'Age', 'Sex', 'Grade'))
writer.writerows(lines)
log.txt:
Chris,M.,14,M,4
Adam,A.,17,M
Jack,O.,M,8
如何清空数据插入“无”字符串? 例如:
Chris,M.,14,M,4
Adam,A.,17,M,None
Jack,O.,None,M,8
在Python中执行此操作的最佳方法是什么?
答案 0 :(得分:1)
使用pandas:
import pandas
data=pandas.read_fwf("file.txt")
获取你的词典:
data.set_index("Name")["Grade"].to_dict()
答案 1 :(得分:1)
Pure Python™中的内容似乎可以满足您的需求,至少在您问题中的示例数据文件中。
简而言之,它的作用是首先确定列标题行中每个字段名称的开始和结束位置,然后对于文件的每个剩余行,确实获得第二个列表用于确定行中每个数据项的哪一列(然后将其放入将写入输出文件的行中的正确位置)。
<form name="formprofile" method="POST" action="">
{% csrf_token %}
<p id="profile_timezone" class="form-inline">
{{ profile_edit_form.profile_timezone.errors }}
Timezone:
{{ profile_edit_form.profile_timezone }}
</p>
<button id="id_btn_profile_edit_save" type="submit" class="btn btn-default" tabindex=7>Save</button>
</form>
以下是它创建的import csv
def find_words(line):
""" Return a list of (start, stop) tuples with the indices of the
first and last characters of each "word" in the given string.
Any sequence of consecutive non-space characters is considered
as comprising a word.
"""
line_len = len(line)
indices = []
i = 0
while i < line_len:
start, count = i, 0
while line[i] != ' ':
count += 1
i += 1
if i >= line_len:
break
indices.append((start, start+count-1))
while i < line_len and line[i] == ' ': # advance to start of next word
i += 1
return indices
# convert text file with missing fields to csv
with open('name_grades.txt', 'rt') as in_file, open('log.csv', 'wt', newline='') as out_file:
writer = csv.writer(out_file)
header = next(in_file) # read first line
fields = header.split()
writer.writerow(fields)
# determine the indices of where each field starts and stops based on header line
field_positions = find_words(header)
for line in in_file:
line = line.rstrip('\r\n') # remove trailing newline
row = ['None' for _ in range(len(fields))]
value_positions = find_words(line)
for (vstart, vstop) in value_positions:
# determine what field the value is underneath
for i, (hstart, hstop) in enumerate(field_positions):
if vstart <= hstop and hstart <= vstop: # overlap?
row[i] = line[vstart:vstop+1]
break # stop looking
writer.writerow(row)
文件的内容:
log.csv
答案 2 :(得分:0)
我会使用baloo的答案而不是我的 - 但如果您只是想了解代码出错的地方,下面的解决方案大部分都有效(等级字段存在格式问题,但我确定您可以通过它。)添加一些打印语句到您的代码和我的,你应该能够找到差异。
导入csv
<Old Code removed in favor of new code below>
编辑:我现在看到你的困难。请尝试以下代码;我今天没时间了,所以你必须填写print语句所在的编写器部分,但这将满足你用None替换空字段的请求。
import csv
with open('Test.txt', 'r') as in_file:
with open('log.csv', 'w') as out_file:
writer = csv.writer(out_file)
lines = [line for line in in_file]
name_and_grade = dict()
for line in lines[1:]:
parts = line[0:10], line[11:19], line[20:24], line[25:31], line[32:]
new_line = list()
for part in parts:
val = part.replace('/n','')
val = val.strip()
val = val if val != '' else 'None'
new_line.append(val)
print(new_line)
答案 3 :(得分:0)
不使用熊猫:
根据您的评论进行编辑,我根据您的数据对此解决方案进行了硬编码。这不适用于没有Surname
列的行。
我写出Name
和Grade
,因为您只需要这两列。
o = open("out.txt", 'w')
with open("inFIle.txt") as f:
for lines in f:
lines = lines.strip("\n").split(",")
try:
grade = int(lines[-1])
if (lines[-2][-1]) != '.':
o.write(lines[0]+","+ str(grade)+"\n")
except ValueError:
print(lines)
o.close()