我正在尝试将CSV中的值导入10列:有些有数字,有些有逗号,但是逗号缺少值,所以没有分隔符:
2000-01-05,,-0.8803936956661669,,,,,,,-0.8316023477879247,
2000-01-06,,,,,,,,,,
2000-01-07,,,,,,,,,-0.3133976053851764,
2000-01-10,-0.26878027549229977,,,,,,,,,
2000-01-11,,,,,,,,1.0787295663966179,,
我尝试了以下代码,但删除了左侧的日期列:
data = np.genfromtxt('Book7.txt', invalid_raise = True, usemask = False)
datanew = data[:,~np.all(np.isnan(data), axis = 0)]
答案 0 :(得分:1)
我不知道您希望丢失的数据是什么,但是此代码将日期列转换为datetime.date,同时将缺失值设置为NaN。
import numpy as np
import datetime
def convert_iso_string_to_date(s):
year, month, day = (int(x) for x in s.decode("ascii").split("-"))
return datetime.date(year, month, day)
data = np.genfromtxt("test.txt", delimiter=",", converters={0: convert_iso_string_to_date}, invalid_raise=True, usemask=False)
print(data)
[(datetime.date(2000, 1, 5), nan, -0.8803937, nan, nan, nan, nan, nan, nan, -0.83160235, nan)
(datetime.date(2000, 1, 6), nan, nan, nan, nan, nan, nan, nan, nan, nan, nan)
(datetime.date(2000, 1, 7), nan, nan, nan, nan, nan, nan, nan, nan, -0.31339761, nan)
(datetime.date(2000, 1, 10), -0.26878028, nan, nan, nan, nan, nan, nan, nan, nan, nan)
(datetime.date(2000, 1, 11), nan, nan, nan, nan, nan, nan, nan, 1.07872957, nan, nan)]
答案 1 :(得分:0)
不确定numpy是首选还是必需。熊猫无需额外的代码即可完成此操作:
import io
import pandas as pd
text = """2000-01-05,,-0.8803936956661669,,,,,,,-0.8316023477879247,
2000-01-06,,,,,,,,,,
2000-01-07,,,,,,,,,-0.3133976053851764,
2000-01-10,-0.26878027549229977,,,,,,,,,
2000-01-11,,,,,,,,1.0787295663966179,,"""
csv = io.StringIO(text)
df = pd.DataFrame([cell.split(',') for cell in csv])
print(df)
输出:
0 1 ... 9 10
0 2000-01-05 ... -0.8316023477879247 \n
1 \n None ... None None
2 2000-01-06 ... \n
3 \n None ... None None
4 2000-01-07 ... -0.3133976053851764 \n
5 \n None ... None None
6 2000-01-10 -0.26878027549229977 ... \n
7 \n None ... None None
8 2000-01-11 ...
[9 rows x 11 columns]
您可能想要删除空行。
答案 2 :(得分:0)
您可以简单地使用python内置函数:
from numpy import array
with open('Book7.txt') as file:
data = file.readlines()
matrix = []
for line in data:
if line != '\n':
matrix.append(line.split(',')[0:10])
matrix = array(matrix)