Question

我正在尝试使用Python将数据从.txt文件转换为.csv文件。我的.txt文件当前的格式如下：

www.thing.com
Thing
2010
linkedin.com/company/thing
www.hello.com
Hello
1999
linkedin.com/company/hello
...

我想要一个程序，该程序将阅读有关“事物”的4行内容，并将其排成一行。然后，它将读取关于“ Hello”的4行，并将其放入一行，每一项与“ Thing”的行位于同一列。

"www.thing.com,Thing,2010,linkedin.com/company/thing"
"www.hello.com,Hello,1999,linkedin.com/company/hello"
...

这是我到目前为止（不是很多）：

import csv

text_file = open("document.txt", "r")

with open('output.csv', 'wb') as mycsv:
    filewriter = csv.writer(mycsv)

    mycsv.writerow(["company", "name", "date", "linkedin"])

    for line in text_file:
        URL = line
        line = next(text_file)
        name = line
        line = next(text_file)
        date = line
        line = next(text_file)
        LinkedIn = line
        line = next(text_file)
        mycsv.writerow(URL, name, date, LinkedIn)

到目前为止，我最多查过的.txt文档的格式只有一行，而我的.txt的数据行却是多行。

我将如何解决这个问题？

Answer 1

这是解决您问题的另一种方法：

def group_data(table, n=4):
    # Group your table's data by n elements
    yield from [table[k: k + n] for k in range(0, len(table), n)]


def write_csv(file_name, data):
    with open(file_name, 'a') as f:
        # Loop over your grouped data
        for elm in data:
            # Write the grouped elemenets into a file
            f.write(','.join(k for k in elm) + '\n')



a = '''www.thing.com
Thing
2010
linkedin.com/company/thing
www.hello.com
Hello
1999
linkedin.com/company/hello'''

data = [elm for elm in a.split('\n')]
grouped = group_data(data)
write_csv('csv_file.csv', grouped)

输出：

www.thing.com,Thing,2010,linkedin.com/company/thing
www.hello.com,Hello,1999,linkedin.com/company/hello

注意：：如果无法按固定的数字对数据进行分组，那么您应该考虑一种新的算法来获得所需的输出。或者，您可以查看是否存在可用于进行分组的重复模式。否则，此当前代码将与您当前的文本片段一起使用。

Answer 2

您可以通过将输入文件迭代器压缩4次来将输入文件分为4行：

from itertools import repeat
csv.writer(mycsv).writerows([[i.rstrip() for i in r] for r in zip(*repeat(text_file, 4))])

将包含多行数据的.txt转换为.csv

2 个答案: