这两个清单是:
lst1, lst2 = np.arange(301), 0.333 * np.arange(301)
首先,我从这些列表中创建了两个数据框,每列有7列。
arr1, arr2 = np.array_split(lst1, 7), np.array_split(lst2, 7)
df1 = DF(arr1).T
df2 = DF(arr2).T
这里,df1和df2各有7列。
例如,df1有以下列:
col0 = [0, 1, 2, 43]
col1 = [44,45,..,86]
col6 = [259,...,301]
数据框df2也有7列 目标是在大型数据框架中将这7列并排放置。
结果应如下所示:
集团集团......集团
Galaxy Diff Galaxy Diff Galaxy Diff
0
1
2
................................
43
我的尝试是这样的:
# Imports
import numpy as np
import pandas as pd
from pandas import DataFrame as DF
## Break the data
lst1, lst2 = np.arange(301), 0.333 * np.arange(301)
arr1, arr2 = np.array_split(lst1, 7), np.array_split(lst2, 7)
df1 = DF(arr1).T
df2 = DF(arr2).T
# Assign column names
clm = [ 'Group_%d'%i for i in range(len(arr1))]
df1.columns = clm
df2.columns = clm
# Make data type integer
#for i in range(df1.shape[1]):
#df1[i] = df1[i].astype(int)
df1.to_csv('tmp.txt',sep='\t')
df2.to_csv('tmp2.txt',sep='\t')
问题
1. df1中的数字是浮点数,它们应该是整数
2. df2中的数字有很多精确度,它们应该是%.3f格式
3. df1[i] = df1[i].astype(int)
命令失败
而且,df1和df2是两个独立的数据帧,我想让它们成为一个
单个分层数据帧,每列有7列,有两个子列(即gal和diff),并带有相应的值。
一些相关链接
https://chrisalbon.com/python/pandas_hierarchical_data.html
https://pandas.pydata.org/pandas-docs/stable/advanced.html
How to apply hierarchy or multi-index to panda columns
apply hierarchy or multi-index to panda columns
Pandas reset index on series to remove multiindex
答案 0 :(得分:0)
我解决了如何组合两个数据帧的相应列的问题 注意:所有列应具有相同的长度(行)。
代码是:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# Author : Bhishan Poudel, Physics PhD Student, Ohio University
# Date : Jun 21, 2017 Wed
# Imports
import numpy as np
import pandas as pd
from pandas import DataFrame as DF
# Pandas setting to display long dataframe in terminal.
pd.set_option('display.width', None)
pd.set_option('precision', 3)
NCOLS = 7
# Break the data (Note that 43*7 = 301)
lst1, lst2 = np.arange(301), 0.333 * np.arange(301) # One row with 301 elements
arr1, arr2 = np.array_split(lst1, NCOLS), np.array_split(lst2, NCOLS) # 43*7 matrix
# Make dataframe
df1, df2 = DF(arr1).T, DF(arr2).T
columns = [ 'Galaxy_%d'%i for i in range(7)]
df1.columns, df2.columns = columns, columns
# Combine respective columns and create new df.
for i in range(7):
df1.insert(i*2+1,'Diff_%d'%i,df2['Galaxy_%d'%i])
# Multi index from arrays
arr_idx = [ ['Group', '']*7, [ 'Galaxy', 'Diff' ]*7 ] # Double line top index
index = pd.MultiIndex.from_arrays(arr_idx, names=['', '']) # leftmost index
df1.columns=index
print(df1)
# Write df1 into a file
df1.to_csv('tmp.txt',float_format='%.3f',sep='\t')
在这里,终端打印出漂亮且格式精美的dafaframe。
Group Group Group Group Group Group Group
Galaxy Diff Galaxy Diff Galaxy Diff Galaxy Diff Galaxy Diff Galaxy Diff Galaxy Diff
0 0 0.000 43 14.319 86 28.638 129 42.957 172 57.276 215 71.595 258 85.914
1 1 0.333 44 14.652 87 28.971 130 43.290 173 57.609 216 71.928 259 86.247
2 2 0.666 45 14.985 88 29.304 131 43.623 174 57.942 217 72.261 260 86.580
3 3 0.999 46 15.318 89 29.637 132 43.956 175 58.275 218 72.594 261 86.913
4 4 1.332 47 15.651 90 29.970 133 44.289 176 58.608 219 72.927 262 87.246
5 5 1.665 48 15.984 91 30.303 134 44.622 177 58.941 220 73.260 263 87.579
6 6 1.998 49 16.317 92 30.636 135 44.955 178 59.274 221 73.593 264 87.912
7 7 2.331 50 16.650 93 30.969 136 45.288 179 59.607 222 73.926 265 88.245
8 8 2.664 51 16.983 94 31.302 137 45.621 180 59.940 223 74.259 266 88.578
9 9 2.997 52 17.316 95 31.635 138 45.954 181 60.273 224 74.592 267 88.911
10 10 3.330 53 17.649 96 31.968 139 46.287 182 60.606 225 74.925 268 89.244
11 11 3.663 54 17.982 97 32.301 140 46.620 183 60.939 226 75.258 269 89.577
12 12 3.996 55 18.315 98 32.634 141 46.953 184 61.272 227 75.591 270 89.910
13 13 4.329 56 18.648 99 32.967 142 47.286 185 61.605 228 75.924 271 90.243
14 14 4.662 57 18.981 100 33.300 143 47.619 186 61.938 229 76.257 272 90.576
15 15 4.995 58 19.314 101 33.633 144 47.952 187 62.271 230 76.590 273 90.909
16 16 5.328 59 19.647 102 33.966 145 48.285 188 62.604 231 76.923 274 91.242
17 17 5.661 60 19.980 103 34.299 146 48.618 189 62.937 232 77.256 275 91.575
18 18 5.994 61 20.313 104 34.632 147 48.951 190 63.270 233 77.589 276 91.908
19 19 6.327 62 20.646 105 34.965 148 49.284 191 63.603 234 77.922 277 92.241
20 20 6.660 63 20.979 106 35.298 149 49.617 192 63.936 235 78.255 278 92.574
21 21 6.993 64 21.312 107 35.631 150 49.950 193 64.269 236 78.588 279 92.907
22 22 7.326 65 21.645 108 35.964 151 50.283 194 64.602 237 78.921 280 93.240
23 23 7.659 66 21.978 109 36.297 152 50.616 195 64.935 238 79.254 281 93.573
24 24 7.992 67 22.311 110 36.630 153 50.949 196 65.268 239 79.587 282 93.906
25 25 8.325 68 22.644 111 36.963 154 51.282 197 65.601 240 79.920 283 94.239
26 26 8.658 69 22.977 112 37.296 155 51.615 198 65.934 241 80.253 284 94.572
27 27 8.991 70 23.310 113 37.629 156 51.948 199 66.267 242 80.586 285 94.905
28 28 9.324 71 23.643 114 37.962 157 52.281 200 66.600 243 80.919 286 95.238
29 29 9.657 72 23.976 115 38.295 158 52.614 201 66.933 244 81.252 287 95.571
30 30 9.990 73 24.309 116 38.628 159 52.947 202 67.266 245 81.585 288 95.904
31 31 10.323 74 24.642 117 38.961 160 53.280 203 67.599 246 81.918 289 96.237
32 32 10.656 75 24.975 118 39.294 161 53.613 204 67.932 247 82.251 290 96.570
33 33 10.989 76 25.308 119 39.627 162 53.946 205 68.265 248 82.584 291 96.903
34 34 11.322 77 25.641 120 39.960 163 54.279 206 68.598 249 82.917 292 97.236
35 35 11.655 78 25.974 121 40.293 164 54.612 207 68.931 250 83.250 293 97.569
36 36 11.988 79 26.307 122 40.626 165 54.945 208 69.264 251 83.583 294 97.902
37 37 12.321 80 26.640 123 40.959 166 55.278 209 69.597 252 83.916 295 98.235
38 38 12.654 81 26.973 124 41.292 167 55.611 210 69.930 253 84.249 296 98.568
39 39 12.987 82 27.306 125 41.625 168 55.944 211 70.263 254 84.582 297 98.901
40 40 13.320 83 27.639 126 41.958 169 56.277 212 70.596 255 84.915 298 99.234
41 41 13.653 84 27.972 127 42.291 170 56.610 213 70.929 256 85.248 299 99.567
42 42 13.986 85 28.305 128 42.624 171 56.943 214 71.262 257 85.581 300 99.900