我有一些看起来像这样的数据:
<link rel="stylesheet" href="<c:url value="resources/css/common/common.css"/> ">
您可以使用dstat
生成测试数据。
我想将其导入数据框(Python 3.5.2,pandas 0.18.1):
----system---- ---load-avg--- ----total-cpu-usage---- ------memory-usage----- -dsk/total- --io/total- ---paging-- -net/total-
date/time | 1m 5m 15m |usr sys idl wai hiq siq| used buff cach free| read writ| read writ| in out | recv send
10-11 00:00:01|0.67 0.42 0.31| 2 0 98 0 0 0|25.0G 16.9M 6331M 189M|2101k 901k|30.4 28.3 | 63B 75B| 0 0
10-11 00:00:03|0.67 0.42 0.31| 4 0 95 0 0 0|25.0G 16.9M 6332M 190M| 50k 1142k|4.00 18.0 | 0 0 | 310k 6765B
10-11 00:00:05|0.62 0.41 0.31| 4 0 95 0 0 0|25.0G 16.9M 6333M 189M| 116k 2534k|3.50 113 | 0 0 | 484k 27k
10-11 00:00:07|0.62 0.41 0.31| 7 1 92 0 0 0|25.0G 16.9M 6335M 187M| 154k 2372k|4.00 128 | 0 0 |1159k 24k
10-11 00:00:09|0.62 0.41 0.31| 5 0 95 0 0 0|25.0G 16.9M 6336M 185M| 0 1556k| 0 38.5 | 0 0 | 396k 4172B
10-11 00:00:11|0.73 0.44 0.32| 4 1 95 0 0 0|25.0G 16.9M 6336M 184M| 136k 2732k|3.50 139 | 0 0 | 270k 28k
这是我的表达,但不起作用:
date/time 1m 5m 15m usr sys idl wai hiq siq used buff cach free read writ read writ in out recv send
10-11 00:00:01 0.67 0.42 0.31 2 0 98 0 0 0 25.0G 16.9M 6331M 189M 2101k 901k 30.4 28.3 63B 75B 0 0
10-11 00:00:03 0.67 0.42 0.31 4 0 95 0 0 0 25.0G 16.9M 6332M 190M 50k 1142k 4.00 18.0 0 0 310k 6765B
10-11 00:00:05 0.62 0.41 0.31 4 0 95 0 0 0 25.0G 16.9M 6333M 189M 116k 2534k 3.50 113 0 0 484k 27k
10-11 00:00:07 0.62 0.41 0.31 7 1 92 0 0 0 25.0G 16.9M 6335M 187M 154k 2372k 4.00 128 0 0 1159k 24k
10-11 00:00:09 0.62 0.41 0.31 5 0 95 0 0 0 25.0G 16.9M 6336M 185M 0 1556k 0 38.5 0 0 396k 4172B
10-11 00:00:11 0.73 0.44 0.32 4 1 95 0 0 0 25.0G 16.9M 6336M 184M 136k 2732k 3.50 139 0 0 270k 28k
我不想编辑文本文件。
答案 0 :(得分:2)
试试这个:
import io
fn = r'D:\temp\.data\data.fwf'
with open(fn) as f:
data = f.read().replace('|', ' ')
cols = 'date time 1m 5m 15m usr sys idl wai hiq siq used buff cach free ' \
'dsk.read dsk.writ io.read io.writ in out recv send'.split()
df = pd.read_csv(io.StringIO(data), delim_whitespace=True, skiprows=2,
header=None, names=cols)
In [85]: df
Out[85]:
date time 1m 5m 15m usr sys idl wai hiq ... cach free dsk.read dsk.writ io.read io.writ in out recv send
0 10-11 00:00:01 0.67 0.42 0.31 2 0 98 0 0 ... 6331M 189M 2101k 901k 30.4 28.3 63B 75B 0 0
1 10-11 00:00:03 0.67 0.42 0.31 4 0 95 0 0 ... 6332M 190M 50k 1142k 4.0 18.0 0 0 310k 6765B
2 10-11 00:00:05 0.62 0.41 0.31 4 0 95 0 0 ... 6333M 189M 116k 2534k 3.5 113.0 0 0 484k 27k
3 10-11 00:00:07 0.62 0.41 0.31 7 1 92 0 0 ... 6335M 187M 154k 2372k 4.0 128.0 0 0 1159k 24k
4 10-11 00:00:09 0.62 0.41 0.31 5 0 95 0 0 ... 6336M 185M 0 1556k 0.0 38.5 0 0 396k 4172B
5 10-11 00:00:11 0.73 0.44 0.32 4 1 95 0 0 ... 6336M 184M 136k 2732k 3.5 139.0 0 0 270k 28k
[6 rows x 23 columns]
PS IMO更合适的解决方案是使用pd.read_fwf()并指定colspecs
参数,但我对此太懒了;-) ......