Question

您好，感谢您关注我的问题！我正在尝试根据另一列的条件更改数据框中的一列。

我有两个数据框，第一个称为“ df_Ckt”的数据框用于查找具有特定电路和特定年份的year_value，看起来像这样：

    df_Ckt.head(5)
    Circuit Key    2019   2020    2021    2022    2023    2024    2025    2026       2027     2028
    0   CKT_4340_00865  9.256492    9.320154    9.658590    9.674177    9.674177    9.674177    9.674177    9.674177    9.674177    9.674177
    1   CKT_14438_00891 1.078450    1.102765    1.227634    1.412518    1.723032    1.929562    2.140825    2.339290    2.555398    2.752190
    2   CKT_37_01894    6.214399    6.372979    6.549099    6.822940    7.258766    7.554228    7.865580    8.155443    8.469345    8.737263
    3   CKT_3543_03099  7.658913    7.759223    7.872652    7.889068    7.915327    7.930130    8.965180    8.981075    8.998183    9.013649
    4   CKT_4380_03370  8.616798    8.633209    8.830170    9.123515    9.581061    9.885816    10.192292   10.476004   9.872779    10.153234

另一个称为“ df”的数据帧如下所示：

df.head(5)
circuit_key        year calculated
0   CKT_5670_00020  2019    NA
1   CKT_5670_00020  2019    NA
2   CKT_5670_00020  2019    NA
3   CKT_5670_00020  2019    NA
4   CKT_5670_00020  2019    NA

“ df”中的年份范围是2019-2028，我添加了一个名为“ calculated”的列以捕获df_Ckt中的year_value。它应该看起来像这样：

 circuit_key           year calculated
0   CKT_5670_00020  2019    8.241063
1   CKT_5670_00020  2019    8.241063
2   CKT_5670_00020  2019    8.241063
3   CKT_5670_00020  2019    8.241063
4   CKT_5670_00020  2019    8.241063

我的代码如下：

df["calculated"]="NA"
for year in range (2019,2029):
    year_value=df_Ckt.loc[df_Ckt['Circuit Key']=="circuit",year].reset_index(drop=True)
    df.loc[np.logical_and(df.year==year,df.calculated=="NA"),['calculated']]=year_value
    print(year,year_value)

输出如下：

2019 0    8.241063
Name: 2019, dtype: float64
2020 0    8.252401
Name: 2020, dtype: float64
2021 0    8.309021
Name: 2021, dtype: float64
2022 0    8.403156
Name: 2022, dtype: float64
2023 0    8.55595
Name: 2023, dtype: float64
2024 0    8.656351
Name: 2024, dtype: float64
2025 0    8.759824
Name: 2025, dtype: float64
2026 0    8.856902
Name: 2026, dtype: float64
2027 0    8.940435
Name: 2027, dtype: float64
2028 0    9.008744
Name: 2028, dtype: float64

当我要测试我修改的列时，全都是NaN，似乎loc函数无法分配该值。

df['calculated']

        ... 
96440    NaN
96441    NaN
96442    NaN
Name: calculated, Length: 96443, dtype: object

然后，我尝试将常数变量分配给该列。我做了如下测试：

df["calculated"]="NA"
for year in range (2019,2029):
    year_value=df_Ckt.loc[df_Ckt['Circuit Key']=="circuit",year].reset_index(drop=True)
    df.loc[np.logical_and(df.year==year,df.calculated=="NA"),['calculated']]=1

在这种情况下，输出看起来是正确的：

0         1
1         1
2         1
         ..

Name: calculated1, Length: 96443, dtype: object

似乎我的“ year_value”存在一些问题，无法将其分配给数据框值。有谁知道如何使它工作？

Answer 1

获得$ # -2 is number of columns needed $ # -s option specifies delimiter, default is tab $ seq 6 | pr -2ts',' 1,4 2,5 3,6 $ seq 6 | pr -3ts',' 1,3,5 2,4,6 $ # you can also change horizontal/vertical order $ seq 6 | pr -3ats',' 1,2,3 4,5,6的原因是因为NaN是一个序列，而不是单个浮点值。要分配计算值，请从year_value系列中提取计算值，然后将其求解。

year_value

熊猫loc函数在循环中分配值时仅返回NaN

1 个答案: