Question

我有一组文本文件，我希望将每个文本文件中的第二列依次添加到新的文本文件中。这些文件以制表符分隔，格式如下：

name dave
age 35
job teacher
income 30000

我已经在第二列的位置生成了一个包含其中一个文件的第一列的文件，希望能够简化问题：

0 name
0 age 
0 job
0 income

我有大量这些文件，并希望将它们全部放在制表符分隔的文本文件中，例如：

name dave mike sue
age 35 28 40
job teacher postman solicitor
income 30000 20000 40000

我有一个文本文件，其中只包含名为all_libs.txt

的所有文件的名称到目前为止，我写过：

#make a sorted list of the file names
with open('all_libs.txt', 'r') as lib:
     people = list([line.rstrip() for line in lib])
     people_s = sorted(people)

i=0

while i< len(people_s):
    with open(people_s[i]) as inf:
        for line in inf:                
            parts = line.split() #split line into parts
            if len(parts) > 1:    #if more than 1 discrete unit in parts
                with open("all_data.txt", 'a') as out_file: #append column2 to all_data
                    out_file.write((parts[1])+"\n")

i=i+1 #go to the next file in the list

在打开每个新文件时，我想将其添加为新列，而不是仅仅添加为新行。真的很感激任何帮助吗？我意识到像SQL这样的东西可能会让这很简单，但我从来没有使用过它，也没有时间去学习SQL的学习曲线。非常感谢。

Answer 1

这是一种非常不切实际的数据存储方式 - 每条记录都分布在所有行上，因此在读取文件和（如您所见）添加记录时，很难重建记录。

您应该使用csv之类的标准格式或（在这种情况下更好）json：

例如，您可以将它们保存为CSV，如下所示：

name,age,job,income
dave,35,teacher,30000
mike,28,postman,20000
sue,40,solicitor,40000

阅读此文件：

>>> import csv
>>> with open("C:/Users/Tim/Desktop/people.csv", newline="") as infile:
...     reader = csv.DictReader(infile)
...     people = list(reader)

现在你有一份人员名单：

>>> people
[{'income': '30000', 'age': '35', 'name': 'dave', 'job': 'teacher'}, 
 {'income': '20000', 'age': '28', 'name': 'mike', 'job': 'postman'}, 
 {'income': '40000', 'age': '40', 'name': 'sue', 'job': 'solicitor'}]

您可以轻松访问：

>>> for item in people:
...     print("{0[name]} is a {0[job]}, earning {0[income]} per year".format(item))
...
dave is a teacher, earning 30000 per year
mike is a postman, earning 20000 per year
sue is a solicitor, earning 40000 per year

现在添加新记录只需将它们添加到文件的末尾：

>>> with open("C:/Users/Tim/Desktop/people.csv", "a", newline="") as outfile:
...    writer = csv.DictWriter(outfile,
...                            fieldnames=["name","age","job","income"])
...    writer.writerow({"name": "paul", "job": "musician", "income": 123456,
...                     "age": 70})

结果：

name,age,job,income
dave,35,teacher,30000
mike,28,postman,20000
sue,40,solicitor,40000
paul,70,musician,123456

或者您可以将其另存为JSON：

>>> import json
>>> with open("C:/Users/Tim/Desktop/people.json", "w") as outfile:
...     json.dump(people, outfile, indent=1)

结果：

[
 {
  "income": "30000", 
  "age": "35", 
  "name": "dave", 
  "job": "teacher"
 }, 
 {
  "income": "20000", 
  "age": "28", 
  "name": "mike", 
  "job": "postman"
 }, 
 {
  "income": "40000", 
  "age": "40", 
  "name": "sue", 
  "job": "solicitor"
 }
]

Answer 2

file_1 = """
name dave1
age 351
job teacher1
income 300001"""

file_2 = """
name dave2
age 352
job teacher2
income 300002"""

file_3 = """
name dave3
age 353
job teacher3
income 300003"""

template = """
0 name
0 age
0 job
0 income"""

假设从文件中读取上述内容

_dict = {}


def concat():
    for cols in template.splitlines():
        if cols:
            _, col_name = cols.split()
            _dict[col_name] = []

    for each_file in [file_1, file_2, file_3]:
        data = each_file.splitlines()
        for line in data:
            if line:
                words = line.split()
                _dict[words[0]].append(words[1])

    _text = ""

    for key in _dict:
        _text += '\t'.join([key, '\t'.join(_dict[key]), '\n'])

    return _text

print concat()

输出

job teacher1    teacher2    teacher3    
age 351 352 353 
name    dave1   dave2   dave3   
income  300001  300002  300003

将第2列从一组文本文件添加到1个文本文件

2 个答案: