我是Python和Pandas的新手,正在通过UCI玩心脏疾病数据集。 https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/hungarian.data
每个人有76个属性,有303个人,所以我想最终得到每个人一行和76列。我无法将其安排到数据帧中,因为数据似乎以9行的形式呈现。
我尝试使用空格或换行符作为分隔符将数据集导入到pandas数据框中,但我仍然无法阻止数据在每8个值之后分割:
df = pd.read_table('https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/hungarian.data', sep=' ')
DF 得到如下表格:
1254 0 40 1 1.1 0.1 0.2
-9.0 2 140.0 0.0 289 -9.0 -9.0 -9.0
0.0 -9 -9.0 0.0 12 16.0 84.0 0.0
0.0 0 0.0 0.0 150 18.0 -9.0 7.0
172.0 86 200.0 110.0 140 86.0 0.0 0.0
0.0 -9 26.0 20.0 -9 -9.0 -9.0 -9.0
我很感激您对如何拆分以及在第76个值之后创建新行的任何建议。每个第76个值都是字符串'name',这表示一个人的数据结束。谢谢!
答案 0 :(得分:1)
作为@Boud has already said,您可以更轻松地预处理数据,而不是按摩"错误构建" DF:
import io
import requests
import pandas as pd
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/hungarian.data'
r = requests.get(url)
if r.status_code != requests.codes.ok:
r.raise_for_status()
data = r.text.replace('\n', ' ').replace(' name ', ' name\n')
df = pd.read_table(io.StringIO(data), sep='\s+', header=None)
print(df)
输出:
In [20]: df
Out[20]:
0 1 2 3 4 5 6 7 8 9 ... 66 67 68 69 70 71 72 73 74 75
0 1254 0 40 1 1 0 0 -9 2 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
1 1255 0 49 0 1 0 0 -9 3 160 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
2 1256 0 37 1 1 0 0 -9 2 130 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
3 1257 0 48 0 1 1 1 -9 4 138 ... 2 -9 1 1 1 1 1 -9.0 -9.0 name
4 1258 0 54 1 1 0 1 -9 3 150 ... 1 -9 1 1 1 1 1 -9.0 -9.0 name
5 1259 0 39 1 1 0 1 -9 3 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
6 1260 0 45 0 0 1 0 -9 2 130 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
7 1261 0 54 1 1 0 0 -9 2 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
8 1262 0 37 1 1 1 1 -9 4 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
9 1263 0 48 0 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
10 1264 0 37 0 1 0 1 -9 3 130 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
11 1265 0 58 1 1 0 0 -9 2 136 ... -9 2 1 1 1 7 1 -9.0 -9.0 name
12 1266 0 39 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
13 1267 0 49 1 1 1 1 -9 4 140 ... 2 -9 1 1 1 1 1 -9.0 -9.0 name
14 1268 0 42 0 1 0 1 -9 3 115 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
15 1269 0 54 0 1 1 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
16 1270 0 38 1 1 1 1 -9 4 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
17 1271 0 43 0 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
18 1272 0 60 1 1 1 1 -9 4 100 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
19 1273 0 36 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
20 1274 0 43 0 0 0 0 -9 1 100 ... -9 -9 1 1 1 1 2 -9.0 -9.0 name
21 1275 0 44 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
22 1276 0 49 0 1 0 0 -9 2 124 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
23 1277 0 44 1 1 0 0 -9 2 150 ... 2 -9 1 1 1 1 1 67.0 -9.0 name
24 1278 0 40 1 1 0 1 -9 3 130 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
.. ... .. .. .. .. .. .. .. .. ... ... .. .. .. .. .. .. .. ... ... ...
269 1032 0 54 1 1 1 0 -9 4 130 ... -9 2 1 1 1 7 1 66.0 -9.0 name
270 1033 0 47 0 1 0 0 -9 3 130 ... -9 -9 1 1 1 1 1 68.0 -9.0 name
271 1034 0 45 1 1 1 1 -9 4 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
272 1035 0 32 0 1 0 0 -9 2 105 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
273 1036 0 55 1 1 1 1 -9 4 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
274 1037 0 55 1 1 0 0 -9 3 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
275 1038 0 45 0 0 0 0 -9 2 180 ... -9 -9 1 1 1 1 1 70.0 -9.0 name
276 1039 0 59 1 1 0 1 -9 3 180 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
277 1041 0 51 1 1 0 0 -9 3 135 ... 2 -9 1 1 3 8 2 -9.0 -9.0 name
278 1042 0 52 1 1 1 1 -9 4 170 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
279 1043 0 57 0 1 1 1 -9 4 180 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
280 1044 0 54 0 1 0 0 -9 2 130 ... -9 -9 1 1 1 1 3 -9.0 -9.0 name
281 1045 0 60 1 1 0 0 -9 3 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
282 1046 0 49 1 1 1 1 -9 4 150 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
283 1047 0 51 0 1 0 1 -9 3 130 ... -9 -9 1 1 1 1 1 61.0 -9.0 name
284 1048 0 55 0 0 0 0 -9 2 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
285 1049 0 42 1 1 1 1 -9 4 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
286 1050 0 51 0 1 0 1 -9 3 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
287 1051 0 59 1 1 1 1 -9 4 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
288 1052 0 53 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
289 1053 0 48 0 0 0 0 -9 2 -9 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
290 1054 0 36 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
291 5001 0 48 1 0 0 0 -9 3 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
292 5000 0 47 0 0 0 0 -9 2 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
293 5002 0 53 1 1 1 1 -9 4 130 ... 1 1 1 1 1 1 1 -9.0 -9.0 name
[294 rows x 76 columns]