重塑pandas数据帧:每76个条目的新行

时间:2016-09-17 03:44:28

标签: python pandas

我是Python和Pandas的新手,正在通过UCI玩心脏疾病数据集。 https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/hungarian.data

每个人有76个属性,有303个人,所以我想最终得到每个人一行和76列。我无法将其安排到数据帧中,因为数据似乎以9行的形式呈现。

我尝试使用空格或换行符作为分隔符将数据集导入到pandas数据框中,但我仍然无法阻止数据在每8个值之后分割:

df = pd.read_table('https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/hungarian.data', sep=' ')

DF 得到如下表格:

    1254    0   40  1   1.1 0.1 0.2
-9.0    2   140.0   0.0 289 -9.0    -9.0    -9.0
0.0 -9  -9.0    0.0 12  16.0    84.0    0.0
0.0 0   0.0 0.0 150 18.0    -9.0    7.0
172.0   86  200.0   110.0   140 86.0    0.0 0.0
0.0 -9  26.0    20.0    -9  -9.0    -9.0    -9.0

我很感激您对如何拆分以及在第76个值之后创建新行的任何建议。每个第76个值都是字符串'name',这表示一个人的数据结束。谢谢!

1 个答案:

答案 0 :(得分:1)

作为@Boud has already said,您可以更轻松地预处理数据,而不是按摩"错误构建" DF:

import io
import requests
import pandas as pd

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/hungarian.data'
r = requests.get(url)
if r.status_code != requests.codes.ok:
    r.raise_for_status()

data = r.text.replace('\n', ' ').replace(' name ', ' name\n')

df = pd.read_table(io.StringIO(data), sep='\s+', header=None)
print(df)

输出:

In [20]: df
Out[20]:
       0   1   2   3   4   5   6   7   8    9   ...   66  67  68  69  70  71  72    73   74    75
0    1254   0  40   1   1   0   0  -9   2  140  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
1    1255   0  49   0   1   0   0  -9   3  160  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
2    1256   0  37   1   1   0   0  -9   2  130  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
3    1257   0  48   0   1   1   1  -9   4  138  ...    2  -9   1   1   1   1   1  -9.0 -9.0  name
4    1258   0  54   1   1   0   1  -9   3  150  ...    1  -9   1   1   1   1   1  -9.0 -9.0  name
5    1259   0  39   1   1   0   1  -9   3  120  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
6    1260   0  45   0   0   1   0  -9   2  130  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
7    1261   0  54   1   1   0   0  -9   2  110  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
8    1262   0  37   1   1   1   1  -9   4  140  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
9    1263   0  48   0   1   0   0  -9   2  120  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
10   1264   0  37   0   1   0   1  -9   3  130  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
11   1265   0  58   1   1   0   0  -9   2  136  ...   -9   2   1   1   1   7   1  -9.0 -9.0  name
12   1266   0  39   1   1   0   0  -9   2  120  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
13   1267   0  49   1   1   1   1  -9   4  140  ...    2  -9   1   1   1   1   1  -9.0 -9.0  name
14   1268   0  42   0   1   0   1  -9   3  115  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
15   1269   0  54   0   1   1   0  -9   2  120  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
16   1270   0  38   1   1   1   1  -9   4  110  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
17   1271   0  43   0   1   0   0  -9   2  120  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
18   1272   0  60   1   1   1   1  -9   4  100  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
19   1273   0  36   1   1   0   0  -9   2  120  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
20   1274   0  43   0   0   0   0  -9   1  100  ...   -9  -9   1   1   1   1   2  -9.0 -9.0  name
21   1275   0  44   1   1   0   0  -9   2  120  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
22   1276   0  49   0   1   0   0  -9   2  124  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
23   1277   0  44   1   1   0   0  -9   2  150  ...    2  -9   1   1   1   1   1  67.0 -9.0  name
24   1278   0  40   1   1   0   1  -9   3  130  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
..    ...  ..  ..  ..  ..  ..  ..  ..  ..  ...  ...   ..  ..  ..  ..  ..  ..  ..   ...  ...   ...
269  1032   0  54   1   1   1   0  -9   4  130  ...   -9   2   1   1   1   7   1  66.0 -9.0  name
270  1033   0  47   0   1   0   0  -9   3  130  ...   -9  -9   1   1   1   1   1  68.0 -9.0  name
271  1034   0  45   1   1   1   1  -9   4  120  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
272  1035   0  32   0   1   0   0  -9   2  105  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
273  1036   0  55   1   1   1   1  -9   4  140  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
274  1037   0  55   1   1   0   0  -9   3  120  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
275  1038   0  45   0   0   0   0  -9   2  180  ...   -9  -9   1   1   1   1   1  70.0 -9.0  name
276  1039   0  59   1   1   0   1  -9   3  180  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
277  1041   0  51   1   1   0   0  -9   3  135  ...    2  -9   1   1   3   8   2  -9.0 -9.0  name
278  1042   0  52   1   1   1   1  -9   4  170  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
279  1043   0  57   0   1   1   1  -9   4  180  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
280  1044   0  54   0   1   0   0  -9   2  130  ...   -9  -9   1   1   1   1   3  -9.0 -9.0  name
281  1045   0  60   1   1   0   0  -9   3  120  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
282  1046   0  49   1   1   1   1  -9   4  150  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
283  1047   0  51   0   1   0   1  -9   3  130  ...   -9  -9   1   1   1   1   1  61.0 -9.0  name
284  1048   0  55   0   0   0   0  -9   2  110  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
285  1049   0  42   1   1   1   1  -9   4  140  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
286  1050   0  51   0   1   0   1  -9   3  110  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
287  1051   0  59   1   1   1   1  -9   4  140  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
288  1052   0  53   1   1   0   0  -9   2  120  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
289  1053   0  48   0   0   0   0  -9   2   -9  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
290  1054   0  36   1   1   0   0  -9   2  120  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
291  5001   0  48   1   0   0   0  -9   3  110  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
292  5000   0  47   0   0   0   0  -9   2  140  ...   -9  -9   1   1   1   1   1  -9.0 -9.0  name
293  5002   0  53   1   1   1   1  -9   4  130  ...    1   1   1   1   1   1   1  -9.0 -9.0  name

[294 rows x 76 columns]