Question

首先，我需要理论上的帮助。（如果有人已经遇到了这个问题并有示例代码，将不胜感激）

想象一下您有一种产品，例如肥皂。里面的描述将有很多标签（里面的文本文件）。

line 1 productName:SOAP1, productCategory:Bath, productSubCategory: Soap, bla, bla, bla
line 2 productName:SOAP2, productCategory:Bath, productSubCategory: Soap, bla, bla, bla
line 3 productName:SOAP3, productCategory:Bath, productSubCategory: Soap, bla, bla, bla

所有列均具有“：”

，我需要使用python代码将这些标签转换为CSV格式：

productName    productCategory    productSubCategory
  SOAP1             Bath                 Soap
  SOAP2             Bath                 Soap
  SOAP3             Bath                 Soap

我不确定什么是最好的方法。

Answer 1

import re
import csv

columns = ['productName', 'productCategory', 'productSubCategory']

with open('data.txt') as infile:
  with open('result.csv', 'w') as outfile:
    writer = csv.DictWriter(outfile, columns)
    writer.writeheader()
    for line in infile:
      row = {}
      for column in columns:
        pattern = column + ':(.+?)(, |$)'
        match = re.search(pattern, line)
        row[column] = match.group(1)
      writer.writerow(row)

Demo

如果您对正则表达式不熟悉，该是时候谷歌搜索和阅读了。

此解决方案假定每个项目的格式均为<tag>:<value>，后跟（1）逗号和空格（", "）或（2）行的结尾（由{{ 1}}。如果值包含$，则结果将不正确。 ", "之后的任何空格都将包含在值中。

Answer 2

这可以让您拥有动态标题。

import pandas as pd

df = pd.read_csv(r'yourfile.txt',header=None)
print (df)
#                0                     1                         2
#productName:SOAP1, productCategory:Bath, productSubCategory: Soap
#productName:SOAP2, productCategory:Bath, productSubCategory: Soap
#productName:SOAP3, productCategory:Bath, productSubCategory: Soap

headerlist = []
for x in df.loc[0,:]:
    headerlist.append(x.split(':')[0])

for x in df.index:
    for y in df.columns:
        df.loc[x,y] = df.loc[x,y].split(':')[1]
df.columns = headerlist

print (df)
#  productName  productCategory  productSubCategory
#0       SOAP1             Bath                Soap
#1       SOAP2             Bath                Soap
#2       SOAP3             Bath                Soap

Answer 3

有趣的是，您可以使用csv模块来读取输入和写入输出文件。

import csv

inp_filename = 'tagged.txt'
out_filename = 'csv_from_tagged.csv'

with open(inp_filename, 'r', newline='') as inp:
    line = next(inp)
    fieldnames = [elem.split(':')[0] for elem in line.split(',')]

    inp.seek(0)  # Rewind

    with open(out_filename, 'w', newline='') as outp:
        csv_writer = csv.DictWriter(outp, fieldnames)
        csv_writer.writeheader()

        for row in csv.reader(inp):
            as_dict = dict(tuple(elem.split(':')) for elem in row)
            csv_writer.writerow(as_dict)

print('done')

Answer 4

也许您可以使用JSONField？在其中存储数据应该更容易。如果没有，请查看下面的代码。

在步骤1中，我建立了一个我认为可以动态的标签列表，并将其放在第一行。此外，它用数据写入csv文件。希望对您有所帮助：）

import csv

text_area_value = 'productName:SOAP1, productCategory:Bath, productSubCategory: Soap, bla, bla, bla\nproductName:SOAP2, productCategory:Bath, productSubCategory: Soap, bla, bla, bla\nproductName:SOAP3, productCategory:Bath, productSubCategory: Soap, bla, bla, bla'

response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename="data.csv"'
writer = csv.writer(response)

tagsList = []
for i, line in enumerate(text_area_value.split('\n')):
    dataList = []   
    if i == 0:
        #Getting the list of tags and the data from first line
        for tag in line.split(','):
            if ':' in tag:
                tagsList.append(tag.split(':')[0].replace(' ', ''))
                dataList.append(tag.split(':')[1].replace(' ', ''))
        writer.writerow(tagsList) # ok we've got list of tags in a first csv line now we can add rest of csv data
        writer.writerow(dataList) # CSV data - first line
    else:
        #Getting the data from every next line
        for tag in line.split(','):
            if ':' in tag:
                dataList.append(tag.split(':')[1].replace(' ', ''))
        writer.writerow(dataList) # CSV data

return response

使用TAGS构造CSV文件

4 个答案: