Python根据范围将一列分为多列

时间:2019-05-07 10:41:06

标签: python pandas numpy

我的数据框中有一个列,其值在2100到8000之间。我想将此列拆分为多个间隔为500的列。所以让我以示例的方式显示给您:

   column
    2100
    2105
    2119
     .
    8000

我想这样分割它:

   column1   column2 column3  .   .  column n
    2100        0       0     .   .    0
      .         0       .     .   .    0
    2600        0       0
              2601      0     .   .    .
                .       .
              3101      0
                       3102            0
                        .
                       3602                  
                                      8000

请提出解决方案。

3 个答案:

答案 0 :(得分:1)

这是使用pd.cutDataFrame.pivot的一种方法:

df = pd.DataFrame(list(range(2100, 8000+1)), columns=['column'])
# create the bins to be used in pd.cut
bins = list(range(df.column.min(), df.column.max()+50, 50))
# array([2100, 2150, 2200, 2250, 2300 ...
# Create the labels for pd.cut, which will be used as column names
labels = [f'column{i}' for i in range(len(bins)-1)]
# ['column0', 'column1', 'column2', 'column3', 'column4', ...
df['bins'] = pd.cut(df.column, bins, labels=labels, include_lowest=True)

哪个会给你:

      column       bins
0       2100    column0
1       2101    column0
2       2102    column0
3       2103    column0
4       2104    column0
5       2105    column0
6       2106    column0
7       2107    column0
8       2108    column0

现在使用pivot获得最终结果:

ix = df.groupby('bins').column.cumcount()
df.pivot(columns = 'bins', index=ix).fillna(0)

bins column0 column1 column2 column3 column4 column5 column6 column7 column8  ... 
0     2100.0  2151.0  2201.0  2251.0  2301.0  2351.0  2401.0  2451.0  2501.0   
1     2101.0  2152.0  2202.0  2252.0  2302.0  2352.0  2402.0  2452.0  2502.0   
2     2102.0  2153.0  2203.0  2253.0  2303.0  2353.0  2403.0  2453.0  2503.0   
3     2103.0  2154.0  2204.0  2254.0  2304.0  2354.0  2404.0  2454.0  2504.0   
4     2104.0  2155.0  2205.0  2255.0  2305.0  2355.0  2405.0  2455.0  2505.0   
5     2105.0  2156.0  2206.0  2256.0  2306.0  2356.0  2406.0  2456.0  2506.0   
6     2106.0  2157.0  2207.0  2257.0  2307.0  2357.0  2407.0  2457.0  2507.0   
7     2107.0  2158.0  2208.0  2258.0  2308.0  2358.0  2408.0  2458.0  2508.0   
8     2108.0  2159.0  2209.0  2259.0  2309.0  2359.0  2409.0  2459.0  2509.0   
9     2109.0  2160.0  2210.0  2260.0  2310.0  2360.0  2410.0  2460.0  2510.0   
10    2110.0  2161.0  2211.0  2261.0  2311.0  2361.0  2411.0  2461.0  2511.0  
...

让它们全部封装在一个函数中,然后尝试使用一个更简单的示例来更好地了解其工作原理:

def binning_and_pivot(df, bin_size):
    bins = list(range(df.column.min(), df.column.max()+bin_size, bin_size))
    labels = [f'column{i}' for i in range(len(bins)-1)]
    df['bins'] = pd.cut(df.column, bins, labels=labels, include_lowest=True)
    ix = df.groupby('bins').column.cumcount()
    return df.pivot(columns = 'bins', index=ix).fillna(0)

df = pd.DataFrame(list(range(100+1)), columns=['column'])
df = df.sample(frac=0.7).reset_index(drop=True)
binning_and_pivot(df, bin_size=10)

bins column0 column1 column2 column3 column4 column5 column6 column7 column8   
0        2.0    16.0    32.0    39.0    45.0    55.0    69.0    81.0    87.0   
1        6.0    21.0    29.0    42.0    46.0    59.0    72.0    76.0    92.0   
2        3.0    13.0    31.0    36.0    49.0    61.0    68.0    74.0    91.0   
3       12.0    20.0    25.0    41.0    52.0    56.0    70.0    78.0    86.0   
4        8.0    17.0    30.0    37.0    43.0    62.0    64.0    73.0    89.0   
5        7.0    19.0    27.0    38.0    50.0    53.0    71.0    77.0    83.0   
6        0.0    22.0    28.0     0.0     0.0    54.0    65.0    82.0    90.0   
7        0.0    18.0    24.0     0.0     0.0    60.0    63.0    80.0     0.0   
8        0.0    14.0    26.0     0.0     0.0     0.0     0.0    75.0     0.0   


bins column9  
0       95.0  
1      100.0  
2       96.0  
3        0.0  
4        0.0  
5        0.0  
6        0.0  
7        0.0  
8        0.0  

答案 1 :(得分:0)

这是我的选择

我做了4次间隔

请注意,行数必须完全除以间隔

import pandas as pd

df = pd.read_csv(r'Z:\Path\neww.txt', delim_whitespace=True)

didi = df.to_dict()
num = 4
dd={}
for i in range(int(len(didi['column'].items())/num)):
    dd['col' + str(i)] = dict(list(didi['column'].items())[i*num:num*(i+1)])
print(pd.DataFrame(dd).apply(lambda x: pd.Series(x.dropna().values)))

输入:

column
2100
2100
2100
2100
2100
2100
2100
2100
2100
2100
8000
8000
8000
8000
8000
8000
8000
8000
80
8000

输出:

     col0    col1    col2    col3    col4
0  2100.0  2100.0  2100.0  8000.0  8000.0
1  2100.0  2100.0  2100.0  8000.0  8000.0
2  2100.0  2100.0  8000.0  8000.0    80.0
3  2100.0  2100.0  8000.0  8000.0  8000.0

答案 2 :(得分:-1)

@Nofy。建议您查看numpy.reshape的文档。

假设您将相关数据提取到一个numpy数组中,例如data。这是一个可能的解决方案。

newdata = data.reshape((500, -1))

newdata是您调整后的数据