我的csv文件有多行以下格式的数字字符串值:
csv 2行样本:
[[' ASA00211063',' 2005'],[ - 0.434358,-0.793407,-1.070576,nan,nan,...(365 values)],[0.354615, -0.108102,nan,...( 365 值)]]
[[' AFR02516075',' 1998'],[ - 0.434358,-0.7934039,-1.0705767,nan,nan,...(365 values)],[0.3546153, -0.1081022,nan,...( 365 值)]]
如何拆分以及将csv文件加入列表中,以便输出为:
list[0] = ['ASA00211063', '2005'], ['AFR02516075', '1998']...
list[1] = [-0.434358, -0.793407, -1.070576, nan, nan,..., 0.354615, -0.108102,nan,...(**730** values)]
list[2] = [-0.434358, -0.7934039, -1.0705767, nan, nan,..., 0.3546153, -0.1081022, nan,...(**730** values)]
答案 0 :(得分:1)
要从文本文件中读取pythonic结构,请始终使用ast.literal_eval()
,它只会读取python结构并阻止任何人在输入文件中嵌入任何令人讨厌的内容。
此代码将遍历输入文件中的每一行,并将其附加到列表中,您可以从中决定要执行的操作。
import ast
l = []
for line in open('inputfile.txt'):
edited_line = line.replace('nan','"nan"')
l.append(ast.literal_eval(edited_line))
这也会将所有nan
替换为numpy.nan
个对象:
import ast
from numpy import nan
l = []
for line in open('inputfile.txt'):
edited_line = line.replace('nan','"nan"')
edited_line = ast.literal_eval(edited_line)
edited_line = [[nan if v == 'nan' else v for v in vals] for vals in edited_line]
l.append(edited_line)
# combine elements [1] and [2] in the sublist to a list of len = 730
# element l[0] is list of ['code', 'yyyy']
# element l[1 ... n] is list of data by row of length 730
l = [[subl[0] for subl in l]] + [subl[1]+subl[2] for subl in l]
给出输出:
for row in l: print row
>>> [['ASA00211063', '2005'], ['AFR02516075', '1998']]
[-0.434358, -0.793407, -1.070576, nan, nan, 0.354615, -0.108102, nan]
[-0.434358, -0.7934039, -1.0705767, nan, nan, 0.3546153, -0.1081022, nan]
答案 1 :(得分:0)
我认为我满足了您对此代码的要求:
#!/usr/bin/python
import re
data = [[]]
for line in open('in'):
line = line.strip()
line = re.match(r'\[?(.*)\]', line).group(1)
res = re.split(r', (?=\[)', line)
data[0].append(res[0])
string = res[1] + res[2]
data.append([string])
for i, v in enumerate(data):
print("{}\n".format(data[i]))
输入:
[['ASA00211063', '2005'], [-0.434358, -0.793407, -1.070576, nan, nan,...(365 values)], [0.354615, -0.108102,nan,...(365 values)]]
[['AFR02516075', '1998'], [-0.434358, -0.7934039, -1.0705767, nan, nan,...(365 values)], [0.3546153, -0.1081022, nan,...(365 values)]]
[['XXX02516075', '1998'], [-1.434358, -1.7934039, -1.1705767, nan, nan,...(365 values)], [0.7546153, -0.7081022, nan,...(365 values)]]
输出:
data[0]:
["['ASA00211063', '2005']", "['AFR02516075', '1998']", "['XXX02516075', '1998']"]
data[1]:
['[-0.434358, -0.793407, -1.070576, nan, nan,...(365 values)][0.354615, -0.108102,nan,...(365 values)]']
data[2]:
['[-0.434358, -0.7934039, -1.0705767, nan, nan,...(365 values)][0.3546153, -0.1081022, nan,...(365 values)]']
data[3]:
['[-1.434358, -1.7934039, -1.1705767, nan, nan,...(365 values)][0.7546153, -0.7081022, nan,...(365 values)]']