将python中的字符串转换为nd-array

时间:2014-03-07 13:14:12

标签: python arrays string numpy

以下是来自Yahoo!的电话的数据回复Finance API,数据存在,但它是一个巨大的字符串。有关如何将此转换为具有日期,打开,高,低,关闭,音量,调整关闭作为列的数组的任何明智的想法?我知道我可以将它转换为一个列表然后使用.reshape将其转换为数组,因为我知道数据的顺序,但我只是想知道是否有更光滑的方式来做它。谢谢

 Date,Open,High,Low,Close,Volume,Adj Close
    2011-01-31,603.60,604.47,595.55,600.36,2804900,600.36
    2011-01-28,619.07,620.36,599.76,600.99,4231100,600.99
    2011-01-27,617.89,619.70,613.25,616.79,2019200,616.79
    2011-01-26,620.33,622.49,615.28,616.50,2038100,616.50
    2011-01-25,608.20,620.69,606.52,619.91,3646800,619.91
    2011-01-24,607.57,612.49,601.23,611.08,4599200,611.08
    2011-01-21,639.58,641.73,611.36,611.83,8904400,611.83
    2011-01-20,632.21,634.08,623.29,626.77,5485800,626.77
    2011-01-19,642.12,642.96,629.66,631.75,3406100,631.75
    2011-01-18,626.06,641.99,625.27,639.63,3617000,639.63
    2011-01-14,617.40,624.27,617.08,624.18,2365600,624.18
    2011-01-13,616.97,619.67,614.16,616.69,1334000,616.69
    2011-01-12,619.35,619.35,614.77,616.87,1632700,616.87
    2011-01-11,617.71,618.80,614.50,616.01,1439300,616.01
    2011-01-10,614.80,615.39,608.56,614.21,1579200,614.21
    2011-01-07,615.91,618.25,610.13,616.44,2101200,616.44
    2011-01-06,610.68,618.43,610.05,613.50,2057800,613.50
    2011-01-05,600.07,610.33,600.05,609.07,2532300,609.07
    2011-01-04,605.62,606.18,600.12,602.12,1824500,602.12
    2011-01-03,596.48,605.59,596.48,604.35,2365200,604.35
    2010-12-31,596.74,598.42,592.03,593.97,1539300,593.97
    2010-12-30,598.00,601.33,597.39,598.86,989500,598.86
    2010-12-29,602.00,602.41,598.92,601.00,1019200,601.00
    2010-12-28,602.05,603.87,598.01,598.92,1064800,598.92
    2010-12-27,602.74,603.78,599.50,602.38,1208100,602.38
    2010-12-23,605.34,606.00,602.03,604.23,1110800,604.23
    2010-12-22,604.00,607.00,603.28,605.49,1207500,605.49
    2010-12-21,598.57,604.72,597.61,603.07,1879500,603.07
    2010-12-20,594.65,597.88,588.66,595.06,1973300,595.06
    2010-12-17,591.00,592.56,587.67,590.80,3087100,590.80
    2010-12-16,592.85,593.77,588.07,591.71,1596900,591.71
    2010-12-15,594.20,596.45,589.15,590.30,2167700,590.30
    2010-12-14,597.09,598.29,592.48,594.91,1643300,594.91
    2010-12-13,597.12,603.00,594.09,594.62,2398500,594.62
    2010-12-10,593.14,593.99,590.29,592.21,1704700,592.21
    2010-12-09,593.88,595.58,589.00,591.50,1868900,591.50
    2010-12-08,591.97,592.52,583.69,590.54,1756900,590.54
    2010-12-07,591.27,593.00,586.00,587.14,3042200,587.14
    2010-12-06,580.57,582.00,576.61,578.36,2093800,578.36
    2010-12-03,569.45,576.48,568.00,573.00,2631200,573.00
    2010-12-02,568.66,573.33,565.35,571.82,2547900,571.82
    2010-12-01,563.00,571.57,562.40,564.35,3754100,564.35
    2010-11-30,574.32,574.32,553.31,555.71,7117400,555.71
    2010-11-29,589.17,589.80,579.95,582.11,2859700,582.11
    2010-11-26,590.46,592.98,587.00,590.00,1311100,590.00
    2010-11-24,587.31,596.60,587.05,594.97,2396400,594.97
    2010-11-23,587.01,589.01,578.20,583.01,2162600,583.01

4 个答案:

答案 0 :(得分:2)

基本上你所要做的就是使用csv module来做到这一点:

import csv
with open(PathFile) as f:
    reader = csv.DictReader(f, skipinitialspace=True)
    for row in reader:
        # all the values within the cells will 
        # be strings you'll have to convert to date, float… using numpy or not.

答案 1 :(得分:0)

csv模块实现了以CSV格式读取和写入表格数据的类。它允许程序员说“以Excel首选格式编写此数据”或“从Excel生成的此文件中读取数据”,而不知道Excel使用的CSV格式的精确细节。程序员还可以描述其他应用程序理解的CSV格式或定义自己的专用CSV格式。

csv模块的读写器对象读写序列。程序员还可以使用DictReader和DictWriter类以字典形式读取和写入数据。

>>> import csv
>>> with open('eggs.csv', 'r') as csvfile:
       spamreader = csv.reader(csvfile, delimiter=',')
       for row in spamreader:
           print ', '.join(row)

答案 2 :(得分:0)

这对我有用,似乎比使用csv方法更容易

import re
import numpy as np

string_list = re.split(',|\n',giant_string)
string_list = [string for string in string_list if x != ''] #take out blanks

string_to_arr = np.array(string_list).reshape(len(string_list)/7, 7) #because I know there are 7 headers

答案 3 :(得分:0)

您可以使用numpy.genfromtxt,然后根据需要直接使用它(如果您不想要命名列,请删除names=True,然后在第一行的开头添加#):

np.genfromtxt('test', dtype=('object',float,float,float,float,float,float), delimiter=',', names = True)