使用空字段将数据存储到namedtuples中以添加其他内容

时间:2014-02-18 02:00:38

标签: python list append field namedtuple

['Date,Open,High,Low,Close,Volume,Adj Close', 
 '2014-02-12,1189.00,1190.00,1181.38,1186.69,1724500,1186.69', 
 '2014-02-11,1180.17,1191.87,1172.21,1190.18,2050800,1190.18', 
 '2014-02-10,1171.80,1182.40,1169.02,1172.93,1945200,1172.93', 
 '2014-02-07,1167.63,1177.90,1160.56,1177.44,2636200,1177.44', 
 '2014-02-06,1151.13,1160.16,1147.55,1159.96,1946600,1159.96', 
 '2014-02-05,1143.38,1150.77,1128.02,1143.20,2394500,1143.20', 
 '2014-02-04,1137.99,1155.00,1137.01,1138.16,2811900,1138.16', 
 '2014-02-03,1179.20,1181.72,1132.01,1133.43,4569100,1133.43']

我需要为这个行列表中的每一行创建一个命名元组,基本上字段将是第一行中的单词'Date,Open,High,Low,Close,Volume,Adj Close',我会然后进行一些计算,并需要在每个namedtuple的末尾再添加2个字段。有关如何做到这一点的任何帮助?

3 个答案:

答案 0 :(得分:2)

from collections import namedtuple

data = ['Date,Open,High,Low,Close,Volume,Adj Close', 
        '2014-02-12,1189.00,1190.00,1181.38,1186.69,1724500,1186.69', 
        '2014-02-11,1180.17,1191.87,1172.21,1190.18,2050800,1190.18', 
        '2014-02-10,1171.80,1182.40,1169.02,1172.93,1945200,1172.93', 
        '2014-02-07,1167.63,1177.90,1160.56,1177.44,2636200,1177.44', 
        '2014-02-06,1151.13,1160.16,1147.55,1159.96,1946600,1159.96', 
        '2014-02-05,1143.38,1150.77,1128.02,1143.20,2394500,1143.20', 
        '2014-02-04,1137.99,1155.00,1137.01,1138.16,2811900,1138.16', 
        '2014-02-03,1179.20,1181.72,1132.01,1133.43,4569100,1133.43']


def convert_to_named_tuples(data):
    # get the names for the named tuple  
    field_names = data[0].split(",")
    # these are you two extra custom fields
    field_names.append("extra1")
    field_names.append("extra2")

    # field names can't have spaces in them (they have to be valid python identifiers
    # and "Adj Close" isn't)
    field_names = [field_name.replace(" ", "_") for field_name in field_names]

    # you can do this as many times as you like.. 
    # personally I'd do it manually once at the start and just check you're getting 
    # the field names you expect here...  
    ShareData = namedtuple("ShareData", field_names)

    # unpack the data into the named tuples
    share_data_list = []
    for row in data[1:]:
        fields = row.split(",")
        fields += [None, None]

        share_data = ShareData(*fields)
        share_data_list.append(share_data)

    return share_data_list

# check it works..
share_data_list = convert_to_named_tuples(data)

for share_data in share_data_list:
    print share_data

实际上我认为这更好,因为它将字段转换为正确的类型。在不利方面,它不会采取任意数据......

from collections import namedtuple
from datetime import datetime 

data = [...same as before...]

field_names = ["Date","Open","High","Low","Close","Volume", "AdjClose", "Extra1", "Extra2"] 
ShareData = namedtuple("ShareData", field_names)

def convert_to_named_tuples(data):
    share_data_list = []
    for row in data[1:]:
        row = row.split(",")

        fields = (datetime.strptime(row[0], "%Y-%m-%d"),  # date
                  float(row[1]), float(row[2]),
                  float(row[3]), float(row[4]),
                  int(row[5]),   # volume
                  float(row[6]), # adj close
                  None, None)    # extras

        share_data = ShareData(*fields)
        share_data_list.append(share_data)

    return share_data_list

# test
share_data_list = convert_to_named_tuples(data)
for share_data in share_data_list:
    print share_data

但我同意其他帖子..为什么在使用类定义时使用namedtuple。

答案 1 :(得分:1)

您想使用namedtuples的任何特殊原因?如果您想稍后添加字段,可能应该使用字典。如果你真的不想采用namedtuple方式,你可以使用占位符,如:

from collections import namedtuple

field_names = data[0].replace(" ", "_").lower().split(",")
field_names += ['placeholder_1', 'placeholder_2']
Entry = namedtuple('Entry', field_names)

list_of_named_tuples = []
mock_data = [None, None]
for row in data[1:]:
    row_data = row.split(",") + mock_data
    list_of_named_tuples.append(Entry(*row_data))

相反,如果您要将数据解析为字典列表(更多pythonic IMO),您应该这样做:

field_names = data[0].split(",")
list_of_dicts = [dict(zip(field_names, row.split(','))) for row in data[1:]]

编辑:请注意,即使您可以为示例中的小数据集使用词典而不是命名元组,但使用大量数据这样做会为您的程序带来更高的内存占用量。

答案 2 :(得分:0)

为什么不使用字典作为数据,然后添加额外的键就很容易了

dataList = []
keys = myData[0].split(',')
for row in myData:
    tempdict = dict()
    for index, value in enumerate(row.split(',')):
        tempdict[keys[index]] = value
        # if your additional values are going to be determined here then 
        # you can do whatever calculations you need and add them
        # otherwise you do work with this list elsewhere
    dataList.append(tempdict)