我有一个像这样的数据框:
band mean raster
1 894.343482 D:/Python/Copied/selection/20170219_095504.tif
2 1159.282304 D:/Python/Copied/selection/20170219_095504.tif
3 1342.291595 D:/Python/Copied/selection/20170219_095504.tif
4 3056.809463 D:/Python/Copied/selection/20170219_095504.tif
1 516.9624071 D:/Python/Copied/selection/20170325_095551.tif
2 720.1932533 D:/Python/Copied/selection/20170325_095551.tif
3 689.6287879 D:/Python/Copied/selection/20170325_095551.tif
4 4561.576329 D:/Python/Copied/selection/20170325_095551.tif
1 566.2016867 D:/Python/Copied/selection/20170527_095700.tif
2 812.9927101 D:/Python/Copied/selection/20170527_095700.tif
3 760.4621212 D:/Python/Copied/selection/20170527_095700.tif
4 5009.537164 D:/Python/Copied/selection/20170527_095700.tif
我想将其格式化为:
band1_mean band2_mean band3_mean band4_mean raster_name id
894.343482 1159.282304 1342.291595 3056.809463 20170219_095504.tif 1
516.9624071 720.1932533 689.6287879 4561.576329 20170325_095551.tif 2
566.2016867 812.9927101 760.4621212 5009.537164 20170527_095700.tif 3
所有4个波段都属于一个栅格,因此值必须全部放在一行中。我不知道如何在没有每个栅格的键ID的情况下堆叠它们。 谢谢!
答案 0 :(得分:1)
有了df.pivot("raster", "band", "mean")
,您将得到
band 1 2 3 4
raster
20170219_095504.tif 894.343482 1159.282304 1342.291595 3056.809463
20170325_095551.tif 516.962407 720.193253 689.628788 4561.576329
20170527_095700.tif 566.201687 812.992710 760.462121 5009.537164
答案 1 :(得分:1)
这是pivot
的情况:
# extract the raster name:
df['raster_name'] = df.raster.str.extract('(\d+_\d+\.tif)')
# pivot
new_df = df.pivot(index='raster_name', columns='band', values='mean')
# rename the columns:
new_df.columns = [f'band{i}_mean' for i in new_df.columns]
输出:
band1_mean band2_mean band3_mean band4_mean
raster_name
20170219_095504.tif 894.343482 1159.282304 1342.291595 3056.809463
20170325_095551.tif 516.962407 720.193253 689.628788 4561.576329
20170527_095700.tif 566.201687 812.992710 760.462121 5009.537164
如果您希望reset_index
是普通列,则可以在new_df
上raster_name
。