Question

我想合并数据框中两列和两列的数字（列中的值是统计分析中置信区间的上限值和下限值）。

我认为最好的方法是使用tidyr和unite函数。但以0.20为例，该数字将修改为0.2，即如果这些数字等于零，则将删除这些数字的最后十进制。使用unite时是否可以保留原始格式？

unite描述如下：https://www.rdocumentation.org/packages/tidyr/versions/0.8.2/topics/unite

示例：

# Dataframe
df <-  structure(list(est = c(0.05, -0.16, -0.02, 0, -0.11, 0.15, -0.26, 
-0.23), low2.5 = c(0.01, -0.2, -0.05, -0.03, -0.2, 0.1, -0.3, 
-0.28), up2.5 = c(0.09, -0.12, 0, 0.04, -0.01, 0.2, -0.22, -0.17
)), row.names = c(NA, 8L), class = "data.frame")

使用逗号作为分隔符，将列与unite合并（合并）以保持置信度

library(tidyr)
df <- unite(df, "CI", c("low2.5", "up2.5"), sep = ", ", remove=T)

给予

df
    est           CI
1  0.05   0.01, 0.09
2 -0.16  -0.2, -0.12
3 -0.02     -0.05, 0
4  0.00  -0.03, 0.04
5 -0.11  -0.2, -0.01
6  0.15     0.1, 0.2
7 -0.26  -0.3, -0.22
8 -0.23 -0.28, -0.17

我想要这个：

    est           CI
1  0.05   0.01, 0.09
2 -0.16  -0.20, -0.12
3 -0.02  -0.05, 0.00
4  0.00  -0.03, 0.04
5 -0.11  -0.20, -0.01
6  0.15   0.10, 0.20
7 -0.26  -0.30, -0.22
8 -0.23  -0.28, -0.17

我相信使用Base R这样做会很复杂（必须移动/重新排列许多合并的列并删除旧列）。有什么方法可以避免unite丢弃零值的小数？

Answer 1

这有效：

import geopandas as gpd
from rasterstats import zonal_stats
from rasterio.mask import mask
from rasterio.plot import show
import matplotlib.pyplot as plt
import numpy as np
import fiona
import rasterio
from scipy import stats
from rasterio.warp import calculate_default_transform, reproject, Resampling

mass_fp = r"New_Massachusetts.tif"

mass_tracts = gpd.read_file("Massachusetts/Massachusetts.shp");
dst_crs = 'EPSG:4269';


with rasterio.open('Massachusetts.tif') as src:
    transform, width, height = calculate_default_transform(
        src.crs, mass_tracts.crs, src.width, src.height, *src.bounds)
    kwargs = src.meta.copy()
    kwargs.update({
        'crs': mass_tracts.crs,
        'transform': transform,
        'width': width,
        'height': height
    })

    with rasterio.open('New_Mass.tif', 'w', **kwargs) as dst:
        for i in range(1, src.count + 1):
            reproject(
                source=rasterio.band(src, i),
                destination=rasterio.band(dst, i),
                src_transform=src.transform,
                src_crs=src.crs,
                dst_transform=transform,
                dst_crs=dst_crs,
                resampling=Resampling.nearest)



#Getting zonal stats
stats = zonal_stats("Massachusetts/Massachusetts.shp", "New_Mass.tif",stats="count",geojson_out=True, copy_properties=True,nodata_value=0,categorical=True);

#Variables for our loop below
total_pop=0.0;
total_pixel_count=0.0;
total_developed = 0.0;
total_water_ice = 0.0;
total_barren_land = 0.0;
total_forest = 0.0;

#Array to store our census track
census_tract_land_percentages = [];

#Looping through each tract in the stats data and getting the data we need and then storing it in a array with dictionaries
#[11,12], [21, 22, 23,24], 31, [41,42,43] 5 

for x in stats:
    total_pixel_count=x["properties"]["count"];
    total_census_population = x["properties"]["DP0010001"]
    total_developed= (float(x["properties"].get(21,0)+x.get(22,0)+x["properties"].get(23,0) + x["properties"].get(24,0))/total_pixel_count)*100;
    total_water_ice = (float(x["properties"].get(11,0)+x["properties"].get(12,0))/total_pixel_count)*100;
    total_barren_land=float(x["properties"].get(31,0)/total_pixel_count)*100;
    total_forest = (float(x["properties"].get(41,0)+x["properties"].get(42,0)+x["properties"].get(43,0))/total_pixel_count)*100;

    census_tract_land_percentages.append({"Total Population:":total_census_population,"Total Water Ice Cover":total_water_ice,"Total Developed":total_developed,
                                         "Total Barren Land":total_barren_land,"Total Forest":total_forest});

print(census_tract_land_percentages);

#Getting the total population for all census tracts
for x in mass_tracts["DP0010001"]:
    total_pop+=x

np_census_arr = np.asarray(census_tract_land_percentages);

tidyr :: unite丢弃小数点0

1 个答案: