我想将mydata.csv
文件转换为Numpy数组。
我有一个矩阵表示mydata.csv
文件
(矩阵为14 * 79,带有带符号的值,没有任何标题名称。)
-0.094391 -0.086641 0.31659 0.66066 -0.33076 0.02751 …
-0.26169 -0.022418 0.47564 0.39925 -0.22232 0.16129 …
-0.33073 0.026102 0.62409 -0.098799 -0.086641 0.31832 …
-0.22134 0.15488 0.69289 -0.26515 -0.021011 0.47096 …
我认为这段代码适用于这种情况。
import numpy as np
data = np.genfromtxt('mydata.csv', dtype=float, delimiter=',', names=False)
但是没有用。
我希望最终的Numpy数据形状为data.shape = (14, 79)
我的错误消息看起来像这样。
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-26-060012d7c568> in <module>
1 import numpy as np
2
----> 3 data = np.genfromtxt('output.csv', dtype=float, delimiter=',', names=False)
~\Anaconda3\envs\tensorflow\lib\site-packages\numpy\lib\npyio.py in genfromtxt(fname, dtype, comments, delimiter, skip_header, skip_footer, converters, missing_values, filling_values, usecols, names, excludelist, deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, invalid_raise, max_rows, encoding)
1810 deletechars=deletechars,
1811 case_sensitive=case_sensitive,
-> 1812 replace_space=replace_space)
1813 # Make sure the names is a list (for 2.5)
1814 if names is not None:
~\Anaconda3\envs\tensorflow\lib\site-packages\numpy\lib\_iotools.py in easy_dtype(ndtype, names, defaultfmt, **validationargs)
934 # Simple dtype: repeat to match the nb of names
935 if nbtypes == 0:
--> 936 formats = tuple([ndtype.type] * len(names))
937 names = validate(names, defaultfmt=defaultfmt)
938 ndtype = np.dtype(list(zip(names, formats)))
TypeError: object of type 'bool' has no len()
答案 0 :(得分:9)
为此,您首先要创建要附加的csv文件列表(文件名)。然后,您可以通过重塑Numpy-Array将其导出到单个csv文件中。这将帮助您前进:
import pandas as pd
import numpy as np
combined_csv_files = pd.concat( [ pd.read_csv(f) for f in file_names ])
现在,如果您要导出这些文件到单个.csv文件,请使用以下方式:
combined_csv_files.to_csv( "combined_csv.csv", index=False)
现在,为了获得Numpy数组,您可以像这样向前移动:
data_set = pd.read_csv('combined_csv.csv', header=None)
data_frames = pd.DataFrame(data_set)
required_array = np.array(data_frames.values)
print(required_array)
在这里,您还可以使用以下方法重塑Numpy Array:
required_array.shape = (100, 14, 79)
我已经在 cmd 上执行了简单测试以确认这一点:
>>> y = np.zeros((2, 3, 4))
>>> y.shape
(2, 3, 4)
>>> y.shape = (3, 8)
>>> y
array([[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.]])
答案 1 :(得分:2)
尝试一下:
import pandas as pd
import numpy as np
mydata = pd.read_csv("mydata.csv")
mydata_array = np.array(mydata)
Out:
array([[-0.26169 , -0.022418, 0.47564 , 0.39925 , -0.22232 , 0.16129 ],
[-0.33073 , 0.026102, 0.62409 , -0.098799, -0.086641, 0.31832 ],
[-0.22134 , 0.15488 , 0.69289 , -0.26515 , -0.021011, 0.47096 ]])
答案 2 :(得分:2)
In [347]: txt = """-0.094391 -0.086641 0.31659 0.66066 -0.33076 0.02751
...: -0.26169 -0.022418 0.47564 0.39925 -0.22232 0.16129
...: -0.33073 0.026102 0.62409 -0.098799 -0.086641 0.31832
...: -0.22134 0.15488 0.69289 -0.26515 -0.021011 0.47096""".splitli
...: nes()
In [348]: txt
Out[348]:
['-0.094391 -0.086641 0.31659 0.66066 -0.33076 0.02751',
'-0.26169 -0.022418 0.47564 0.39925 -0.22232 0.16129',
'-0.33073 0.026102 0.62409 -0.098799 -0.086641 0.31832',
'-0.22134 0.15488 0.69289 -0.26515 -0.021011 0.47096']
In [349]: np.genfromtxt(txt)
Out[349]:
array([[-0.094391, -0.086641, 0.31659 , 0.66066 , -0.33076 , 0.02751 ],
[-0.26169 , -0.022418, 0.47564 , 0.39925 , -0.22232 , 0.16129 ],
[-0.33073 , 0.026102, 0.62409 , -0.098799, -0.086641, 0.31832 ],
[-0.22134 , 0.15488 , 0.69289 , -0.26515 , -0.021011, 0.47096 ]])
False
对于names
是一个错误的值:
In [350]: np.genfromtxt(txt, names=False)
---------------------------------------------------------------------------
...
TypeError: object of type 'bool' has no len()
names=None
可以,但是这是默认值,因此不需要。
看起来分隔符是空格。我看不到逗号。默认dtype为float。