Question

我有一个包含数据点和＆＃39;标识符的列表＆＃39;看起来像这样：

['identifier', 1, 2, 3, 4, 'identifier', 10, 11, 12, 13, 'identifier', ...]

我想将此列表写入CSV文件并为每个标识符启动一个新列。 e.g。

 for data in list:
        if data=='identifier':
            ==> create a new column in the CSV file and print the subsequent data points

我期待听到您的建议。

干杯，

-Sebastian

Answer 1

此解决方案不会将数据写入csv文件，但使用csv库这是一个简单的步骤。这样做是将数据从您提供的数据重组为列表列表，每个子列表是一行数据。

l = ['identifier', 1, 2, 3, 'identifier', 10, 11, 12, 13, 'identifier', 4, 3, 2, 1, 10]

def split_list(l, on):
    """Splits a list an identifier and returns a list of lists split on the
    identifier without including it."""
    splits = []
    cache = []
    for v in l:
        # Check if this is an identifier
        if v == on:
            # Add the cache to splits unless it is empty
            if cache:
                splits.append(cache)
                # Empty the cache
                cache = []
        else:
            cache.append(v)
    # Add the last cache to splits if it is not empyt
    if cache:
        splits.append(cache)
    return splits

def reshape_list(l, default=None):
    """Takes a list of lists assuming each list is a column of values and
    reshapes it to be a list of rows, if list are not all the same length None
    will be used to fill empyt spots."""
    result = []
    # Get the length of the longest list
    maxlen = max(map(len, l))
    for i in range(maxlen):
        # Create each row
        row = []
        # Extract the values from the columns
        for column in l:
            if i < len(column):
                row.append(column[i])
            else:
                row.append(default)
        result.append(row)
    return result


print(l)
t = split_list(l, 'identifier')
print(t)
r = reshape_list(t)
print(r)

Answer 2

生成演示数据：

url = "https://server.com/app/login.aspx?ReturnUrl=/app/getData.aspx?type=GETDATA&id=123"
... SAME SCRIPT AS ABOVE ...
>>> print response.url
https://server.com/app/getData.aspx?type=GETUSER
>>> print response.content
ERROR   Some parameter is missing

输出：

import random

random.seed(20180119) # remove to get random data between runs
id = 'identifier'

def genData():
    data = []
    for n in range(10+random.randint(1,10)):
        data.append(id)
        data.extend(random.choices(range(1,20),k=random.randint(3,12)))
    print(data)
    return data

<强>格式化：

['identifier', 18, 6, 19, 10, 12, 18, 17, 12, 
 'identifier', 10, 17, 17, 10, 15, 12, 16, 18, 19, 18, 14, 9, 
 'identifier', 6, 10, 1, 14, 4, 
 'identifier', 3, 7, 7, 4, 8, 2, 16, 8, 1, 8, 16, 6, 
 'identifier', 6, 17, 8, 8, 13, 15, 7, 9, 4, 10, 15, 
 'identifier', 17, 8, 3, 8, 2, 19, 16, 2, 5, 6, 
 'identifier', 18, 6, 18, 19, 7, 8, 14, 7, 7, 19, 
 'identifier', 13, 7, 4, 13, 
 'identifier', 15, 8, 17, 8, 1, 12, 16, 7, 5, 19, 14, 9, 
 'identifier', 18, 16, 10, 7, 16, 18, 19, 6, 15, 8, 13, 15, 
 'identifier', 15, 2, 18, 13, 7, 
 'identifier', 17, 19, 15, 4, 18, 7, 13, 17, 8, 9, 
 'identifier', 9, 17, 18, 8, 17, 17, 17, 
 'identifier', 3, 16, 15, 13, 9, 
 'identifier', 15, 12, 2, 16, 2, 5, 16, 18]

写入数据：

def partitionData(idToUse,dataToUse):
    lastId = None
    for (i,n) in enumerate(data):       # identify subslices of data
        if n == idToUse and not lastId:     # find first id, data before is discarded
          lastId = i
          continue

        if n == idToUse:                    # found id
          yield data[lastId:i]                  # yield sublist including idToUse
          lastId = i

    if (data[-1] != id):                    # yield rest of data
        yield data[lastId:]

<强> result.csv：

data = genData()
partitioned = partitionData(id, data)

import itertools
import csv
with open('result.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile, delimiter=";")
    # like zip, but fills up shorter ones with None till longest index
    writer.writerows(itertools.zip_longest(*partitioned, fillvalue=None))

链接：
- itertools.zip_longest
- csv-writer

Answer 3

您可以执行类似的操作，假设l是您的列表：

import pandas as pd
import numpy as np
pd.DataFrame(np.array(l).reshape(-1,5)).set_index(0).T.to_csv('my_file.csv',index=0)

Answer 4

如果数据集不是太大，则应首先准备数据，然后将其序列化为csv文件。

import csv

dataset = ['identifier', 1, 2, 3, 4, 'identifier', 10, 11, 12, 13, 'identifier', 21, 22, 23, 24]
columns = []
col = []
for datapoint in dataset:
    if datapoint == 'identifier':
        if col:
            columns.append(col)
            col = []
    else:
        col.append(datapoint)
columns.append(col)

rows_count = max((len(c) for c in columns))

with open('result.csv', 'w') as csvfile:
    writer = csv.writer(csvfile, delimiter=";")

    for x in range(rows_count):
        data = []
        for col in columns:
            if len(col) > x:
                data.append(col[x])
            else:
                data.append("")
        writer.writerow(data)

将列表写入CSV文件并在满足条件时启动新列

4 个答案: