我正在尝试将多行标题设为单行标题,我已经尝试在read_csv函数上添加标题,但是它似乎没有用,在我显示的列上也存在额外的空白我不知道是什么原因造成的,这是csv im试图制作https://gist.github.com/ktisha/c21e73a1bd1700294ef790c56c8aec1f的数据框 这是我在Spyder上编码的结果的显示,您可以看到标题仍显示为不同的行,我希望它显示为单行标题[#1 ....#2 ...#3 .. .. etc],有什么方法可以从标题中删除[#number。]?如果是,如何?然后您还可以看到,有一个我不想的列与其他列(多余的列)相距遥远
import csv
import pandas as pd
with open("D:\python projects\pima-indians-diabetes.csv") as f:
d = pd.read_csv(f , delimiter = ',',header = [0,1,2,3,4,5,6,7,8])
df = pd.DataFrame(d)
print (df[:4])
答案 0 :(得分:1)
编辑:
所以我现在加载真实数据并执行以下操作:
headers = pd.read_csv('pima-indians-diabetes.csv',nrows=9,header=None).loc[:,0].str.replace('# ','')
df = pd.read_csv('pima-indians-diabetes.csv',names=headers, delimiter=',',skiprows=9)
print(df)
1。),因此对于变量“标头”,我仅加载了包含csv文件中标头的前9行。
2。)然后,我仅使用列“ 0”,并根据需要替换那里的“#”。
3。)对于第二次加载,我跳过了前9行,仅加载数据行,并将标头作为“名称”参数。
在执行之前,请确保将数据存储位置更改为您的位置!
两次加载数据可能不是最佳实践,而是我现在想到的唯一处理方法。
结果:
1. Number of times pregnant 2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test 3. Diastolic blood pressure (mm Hg) 4. Triceps skin fold thickness (mm) ... 6. Body mass index (weight in kg/(height in m)^2) 7. Diabetes pedigree function 8. Age (years) 9. Class variable (0 or 1)
0 6 148 72 35 ... 33.6 0.627 50 1
1 1 85 66 29 ... 26.6 0.351 31 0
2 8 183 64 0 ... 23.3 0.672 32 1
3 1 89 66 23 ... 28.1 0.167 21 0
4 0 137 40 35 ... 43.1 2.288 33 1
5 5 116 74 0 ... 25.6 0.201 30 0
6 3 78 50 32 ... 31.0 0.248 26 1
7 10 115 0 0 ... 35.3 0.134 29 0
8 2 197 70 45 ... 30.5 0.158 53 1
9 8 125 96 0 ... 0.0 0.232 54 1
10 4 110 92 0 ... 37.6 0.191 30 0
11 10 168 74 0 ... 38.0 0.537 34 1
12 10 139 80 0 ... 27.1 1.441 57 0
13 1 189 60 23 ... 30.1 0.398 59 1
14 5 166 72 19 ... 25.8 0.587 51 1
15 7 100 0 0 ... 30.0 0.484 32 1
16 0 118 84 47 ... 45.8 0.551 31 1
17 7 107 74 0 ... 29.6 0.254 31 1
18 1 103 30 38 ... 43.3 0.183 33 0
19 1 115 70 30 ... 34.6 0.529 32 1
20 3 126 88 41 ... 39.3 0.704 27 0
21 8 99 84 0 ... 35.4 0.388 50 0
22 7 196 90 0 ... 39.8 0.451 41 1
23 9 119 80 35 ... 29.0 0.263 29 1
24 11 143 94 33 ... 36.6 0.254 51 1
25 10 125 70 26 ... 31.1 0.205 41 1
26 7 147 76 0 ... 39.4 0.257 43 1
27 1 97 66 15 ... 23.2 0.487 22 0
28 13 145 82 19 ... 22.2 0.245 57 0
29 5 117 92 0 ... 34.1 0.337 38 0
.. ... ... ... ... ... ... ... ... ...
738 2 99 60 17 ... 36.6 0.453 21 0
739 1 102 74 0 ... 39.5 0.293 42 1
740 11 120 80 37 ... 42.3 0.785 48 1
741 3 102 44 20 ... 30.8 0.400 26 0
742 1 109 58 18 ... 28.5 0.219 22 0
743 9 140 94 0 ... 32.7 0.734 45 1
744 13 153 88 37 ... 40.6 1.174 39 0
745 12 100 84 33 ... 30.0 0.488 46 0
746 1 147 94 41 ... 49.3 0.358 27 1
747 1 81 74 41 ... 46.3 1.096 32 0
748 3 187 70 22 ... 36.4 0.408 36 1
749 6 162 62 0 ... 24.3 0.178 50 1
750 4 136 70 0 ... 31.2 1.182 22 1
751 1 121 78 39 ... 39.0 0.261 28 0
752 3 108 62 24 ... 26.0 0.223 25 0
753 0 181 88 44 ... 43.3 0.222 26 1
754 8 154 78 32 ... 32.4 0.443 45 1
755 1 128 88 39 ... 36.5 1.057 37 1
756 7 137 90 41 ... 32.0 0.391 39 0
757 0 123 72 0 ... 36.3 0.258 52 1
758 1 106 76 0 ... 37.5 0.197 26 0
759 6 190 92 0 ... 35.5 0.278 66 1
760 2 88 58 26 ... 28.4 0.766 22 0
761 9 170 74 31 ... 44.0 0.403 43 1
762 9 89 62 0 ... 22.5 0.142 33 0
763 10 101 76 48 ... 32.9 0.171 63 0
764 2 122 70 27 ... 36.8 0.340 27 0
765 5 121 72 23 ... 26.2 0.245 30 0
766 1 126 60 0 ... 30.1 0.349 47 1
767 1 93 70 31 ... 30.4 0.315 23 0
第一答案:
样本数据:
0 1 2 3 4 5 6 7 8
0 NaN NaN NaN NaN NaN NaN NaN NaN first
1 NaN NaN NaN NaN NaN NaN NaN NaN second
2 NaN NaN NaN NaN NaN NaN NaN NaN thrid
3 NaN NaN NaN NaN NaN NaN NaN NaN 4
4 NaN NaN NaN NaN NaN NaN NaN NaN 5
5 NaN NaN NaN NaN NaN NaN NaN NaN 6
6 NaN NaN NaN NaN NaN NaN NaN NaN 7
7 NaN NaN NaN NaN NaN NaN NaN NaN 8
8 NaN NaN NaN NaN NaN NaN NaN NaN 9
9 d d d d d d d d d
10 d d d d d d d d d
11 d d d d d d d d d
12 d d d d d d d d d
代码:
import pandas as pd
headers = pd.read_csv('test.csv',header=None).loc[0:8,8].tolist()
df = pd.read_csv('test.csv',names=headers, skiprows=9)
print(df)
结果:
first second thrid 4 5 6 7 8 9
0 d d d d d d d d d
1 d d d d d d d d d
2 d d d d d d d d d
3 d d d d d d d d d