我有一组文本文件,我希望将每个文本文件中的第二列依次添加到新的文本文件中。这些文件以制表符分隔,格式如下:
name dave
age 35
job teacher
income 30000
我已经在第二列的位置生成了一个包含其中一个文件的第一列的文件,希望能够简化问题:
0 name
0 age
0 job
0 income
我有大量这些文件,并希望将它们全部放在制表符分隔的文本文件中,例如:
name dave mike sue
age 35 28 40
job teacher postman solicitor
income 30000 20000 40000
我有一个文本文件,其中只包含名为all_libs.txt
的所有文件的名称 到目前为止,我写过:#make a sorted list of the file names
with open('all_libs.txt', 'r') as lib:
people = list([line.rstrip() for line in lib])
people_s = sorted(people)
i=0
while i< len(people_s):
with open(people_s[i]) as inf:
for line in inf:
parts = line.split() #split line into parts
if len(parts) > 1: #if more than 1 discrete unit in parts
with open("all_data.txt", 'a') as out_file: #append column2 to all_data
out_file.write((parts[1])+"\n")
i=i+1 #go to the next file in the list
在打开每个新文件时,我想将其添加为新列,而不是仅仅添加为新行。真的很感激任何帮助吗?我意识到像SQL这样的东西可能会让这很简单,但我从来没有使用过它,也没有时间去学习SQL的学习曲线。非常感谢。
答案 0 :(得分:2)
这是一种非常不切实际的数据存储方式 - 每条记录都分布在所有行上,因此在读取文件和(如您所见)添加记录时,很难重建记录。
您应该使用csv
之类的标准格式或(在这种情况下更好)json
:
例如,您可以将它们保存为CSV,如下所示:
name,age,job,income
dave,35,teacher,30000
mike,28,postman,20000
sue,40,solicitor,40000
阅读此文件:
>>> import csv
>>> with open("C:/Users/Tim/Desktop/people.csv", newline="") as infile:
... reader = csv.DictReader(infile)
... people = list(reader)
现在你有一份人员名单:
>>> people
[{'income': '30000', 'age': '35', 'name': 'dave', 'job': 'teacher'},
{'income': '20000', 'age': '28', 'name': 'mike', 'job': 'postman'},
{'income': '40000', 'age': '40', 'name': 'sue', 'job': 'solicitor'}]
您可以轻松访问:
>>> for item in people:
... print("{0[name]} is a {0[job]}, earning {0[income]} per year".format(item))
...
dave is a teacher, earning 30000 per year
mike is a postman, earning 20000 per year
sue is a solicitor, earning 40000 per year
现在添加新记录只需将它们添加到文件的末尾:
>>> with open("C:/Users/Tim/Desktop/people.csv", "a", newline="") as outfile:
... writer = csv.DictWriter(outfile,
... fieldnames=["name","age","job","income"])
... writer.writerow({"name": "paul", "job": "musician", "income": 123456,
... "age": 70})
结果:
name,age,job,income
dave,35,teacher,30000
mike,28,postman,20000
sue,40,solicitor,40000
paul,70,musician,123456
或者您可以将其另存为JSON:
>>> import json
>>> with open("C:/Users/Tim/Desktop/people.json", "w") as outfile:
... json.dump(people, outfile, indent=1)
结果:
[
{
"income": "30000",
"age": "35",
"name": "dave",
"job": "teacher"
},
{
"income": "20000",
"age": "28",
"name": "mike",
"job": "postman"
},
{
"income": "40000",
"age": "40",
"name": "sue",
"job": "solicitor"
}
]
答案 1 :(得分:0)
file_1 = """
name dave1
age 351
job teacher1
income 300001"""
file_2 = """
name dave2
age 352
job teacher2
income 300002"""
file_3 = """
name dave3
age 353
job teacher3
income 300003"""
template = """
0 name
0 age
0 job
0 income"""
假设从文件中读取上述内容
_dict = {}
def concat():
for cols in template.splitlines():
if cols:
_, col_name = cols.split()
_dict[col_name] = []
for each_file in [file_1, file_2, file_3]:
data = each_file.splitlines()
for line in data:
if line:
words = line.split()
_dict[words[0]].append(words[1])
_text = ""
for key in _dict:
_text += '\t'.join([key, '\t'.join(_dict[key]), '\n'])
return _text
print concat()
输出
job teacher1 teacher2 teacher3
age 351 352 353
name dave1 dave2 dave3
income 300001 300002 300003