我在阅读csv文件时遇到了麻烦。
我尝试了替换方法。但numpy不支持这一点。
csv文件格式是这样的。
"num","phone","sensorID","press","temp","accel","gps_lat","gps_lng","time"
"1","null","A0:E6:F8:7B:16:EA","0","17","1.25","0","0","2016-12-14 13:34:59"
"2","null","A0:E6:F8:7B:16:A9","0","18","1.19","0","0","2016-12-14 13:34:59"
"3","null","A0:E6:F8:7B:15:A5","0","18","1.19","0","0","2016-12-14 13:34:59"
"4","null","A0:E6:F8:7B:16:EA","0","17","1.25","0","0","2016-12-14 13:35:00"
"5","null","A0:E6:F8:7B:16:A9","0","18","1.19","0","0","2016-12-14 13:35:00"
"6","null","A0:E6:F8:7B:15:A5","0","19","1.38","0","0","2016-12-14 13:35:00"
"7","null","A0:E6:F8:7B:16:D6","0","18","1.12","0","0","2016-12-14 13:35:01"
"8","null","A0:E6:F8:7B:16:EA","0","17","1.31","0","0","2016-12-14 13:35:01"
"9","null","A0:E6:F8:7B:15:A5","0","19","1.38","0","0","2016-12-14 13:35:01"
但是当我在numpy.loadtxt中使用这个文件时,结果就像这样
源代码
import numpy as np
a= np.loadtxt('db_file.csv', delimiter=',', dtype='str', unpack=True)
print a
结果
[['"num"' '"1"' '"2"' ..., '"6979"' '"6980"' '"6981"']
['"phone"' '"null"' '"null"' ..., '" 821099631345"' '" 821099631345"'
'" 821099631345"']
['"sensorID"' '"A0:E6:F8:7B:16:EA"' '"A0:E6:F8:7B:16:A9"' ...,
'"A0:E6:F8:7B:16:EA"' '"A0:E6:F8:7B:16:A9"' '"A0:E6:F8:7B:16:D6"']
...,
['"gps_lat"' '"0"' '"0"' ..., '37.596332"' '"37.596332"' '"37.596332"']
['"gps_lng"' '"0"' '"0"' ..., '"127.031773"' '"127.031773"' '"127.031773"']
['"time"' '"2016-12-14 13:34:59"' '"2016-12-14 13:34:59"' ...,
'"2016-12-15 00:03:11"' '"2016-12-15 00:03:11"' '"2016-12-15 00:03:12"']]
我想删除“这一个。
所以我真的想要这份清单。
[['num', '1', '2' ..., '6979', '6980', '6981']
['phone', 'null', 'null' ..., '821099631345', ' 821099631345'
' 821099631345']
['sensorID', 'A0:E6:F8:7B:16:EA', 'A0:E6:F8:7B:16:A9' ...,
'A0:E6:F8:7B:16:EA', 'A0:E6:F8:7B:16:A9', 'A0:E6:F8:7B:16:D6']
...,
['gps_lat', '0', '0' ..., '37.596332' '37.596332' '37.596332']
['gps_lng' '0' '0' ..., '127.031773' '127.031773' '127.031773']
['time' '2016-12-14 13:34:59' '2016-12-14 13:34:59' ...,
'2016-12-15 00:03:11' '2016-12-15 00:03:11' '2016-12-15 00:03:12']]
我使用什么代码?
答案 0 :(得分:1)
从excel编辑器中找到替换双引号(“)到单引号(')。 因为我不知道您使用的是什么编辑器,所以我会一步一步地为您提供替换MS Excel中的任何字符。
答案 1 :(得分:1)
使用numpy.char.strip
代码:
a = np.array(['"1"', '"2"', '"3"'])
a = np.char.strip(a, '"')
print(a)
输出:
['1' '2' '3']
答案 2 :(得分:0)
我得到了熊猫:
In [1278]: pd.read_csv('stack41338622.txt')
Out[1278]:
num phone sensorID press temp accel gps_lat gps_lng \
0 1 null A0:E6:F8:7B:16:EA 0 17 1.25 0 0
1 2 null A0:E6:F8:7B:16:A9 0 18 1.19 0 0
2 3 null A0:E6:F8:7B:15:A5 0 18 1.19 0 0
3 4 null A0:E6:F8:7B:16:EA 0 17 1.25 0 0
4 5 null A0:E6:F8:7B:16:A9 0 18 1.19 0 0
5 6 null A0:E6:F8:7B:15:A5 0 19 1.38 0 0
6 7 null A0:E6:F8:7B:16:D6 0 18 1.12 0 0
7 8 null A0:E6:F8:7B:16:EA 0 17 1.31 0 0
8 9 null A0:E6:F8:7B:15:A5 0 19 1.38 0 0
time
0 2016-12-14 13:34:59
1 2016-12-14 13:34:59
2 2016-12-14 13:34:59
3 2016-12-14 13:35:00
4 2016-12-14 13:35:00
5 2016-12-14 13:35:00
6 2016-12-14 13:35:01
7 2016-12-14 13:35:01
8 2016-12-14 13:35:01
如Reading CSV files in numpy where delimiter is ","中所述converters
,我们可以删除额外的引号。不幸的是dtypes=None
不再适用于转换器,所以我们必须拼出来。这是一个开始:
In [1327]: def foo(astr):
...: return astr[1:-1]
In [1328]: convs = dict((col, foo) for col in range(9))
In [1329]: dt = ['i','S10','S20','i', 'i','f','i','i','S20']
In [1330]: data = np.genfromtxt('stack41338622.txt', dtype=dt, delimiter=',', names=True, converters=convs)
In [1331]: data
Out[1331]:
array([ (1, b'null', b'A0:E6:F8:7B:16:EA', 0, 17, 1.25, 0, 0, b'2016-12-14 13:34:59'),
(2, b'null', b'A0:E6:F8:7B:16:A9', 0, 18, 1.190000057220459, 0, 0, b'2016-12-14 13:34:59'),
(3, b'null', b'A0:E6:F8:7B:15:A5', 0, 18, 1.190000057220459, 0, 0, b'2016-12-14 13:34:59'),
(4, b'null', b'A0:E6:F8:7B:16:EA', 0, 17, 1.25, 0, 0, b'2016-12-14 13:35:00'),
(5, b'null', b'A0:E6:F8:7B:16:A9', 0, 18, 1.190000057220459, 0, 0, b'2016-12-14 13:35:00'),
(6, b'null', b'A0:E6:F8:7B:15:A5', 0, 19, 1.3799999952316284, 0, 0, b'2016-12-14 13:35:00'),
(7, b'null', b'A0:E6:F8:7B:16:D6', 0, 18, 1.1200000047683716, 0, 0, b'2016-12-14 13:35:01'),
(8, b'null', b'A0:E6:F8:7B:16:EA', 0, 17, 1.309999942779541, 0, 0, b'2016-12-14 13:35:01'),
(9, b'null', b'A0:E6:F8:7B:15:A5', 0, 19, 1.3799999952316284, 0, 0, b'2016-12-14 13:35:01')],
dtype=[('num', '<i4'), ('phone', 'S10'), ('sensorID', 'S20'), ('press', '<i4'), ('temp', '<i4'), ('accel', '<f4'), ('gps_lat', '<i4'), ('gps_lng', '<i4'), ('time', 'S20')])
考虑到我花在这上面的时间,我倾向于采用其他建议 - 在文本编辑器中删除额外的引号。逗号分隔文件中不需要这些引号,而且比帮助更令人讨厌。
在编辑器中,我刚刚删除了"
:
num,phone,sensorID,press,temp,accel,gps_lat,gps_lng,time
1,null,A0:E6:F8:7B:16:EA,0,17,1.25,0,0,2016-12-14 13:34:59
2,null,A0:E6:F8:7B:16:A9,0,18,1.19,0,0,2016-12-14 13:34:59
3,null,A0:E6:F8:7B:15:A5,0,18,1.19,0,0,2016-12-14 13:34:59
4,null,A0:E6:F8:7B:16:EA,0,17,1.25,0,0,2016-12-14 13:35:00
5,null,A0:E6:F8:7B:16:A9,0,18,1.19,0,0,2016-12-14 13:35:00
...
In [1336]: data = np.genfromtxt('stack41338622_1.txt', dtype=None, delimiter=',', names=True)
In [1337]: data
Out[1337]:
array([ (1, b'null', b'A0:E6:F8:7B:16:EA', 0, 17, 1.25, 0, 0, b'2016-12-14 13:34:59'),
(2, b'null', b'A0:E6:F8:7B:16:A9', 0, 18, 1.19, 0, 0, b'2016-12-14 13:34:59'),
(3, b'null', b'A0:E6:F8:7B:15:A5', 0, 18, 1.19, 0, 0, b'2016-12-14 13:34:59'),
...,
dtype=[('num', '<i4'), ('phone', 'S4'), ('sensorID', 'S17'), ('press', '<i4'), ('temp', '<i4'), ('accel', '<f8'), ('gps_lat', '<i4'), ('gps_lng', '<i4'), ('time', 'S19')])
b''
是显示字节串的Python3方式。你不会在Py2中看到那些。