Question

我将CSV读入python中的数据框。我有一个DateTimeIndex和我感兴趣的两列，我们称它们为number和upper_limit。我按索引排序，删除属于较旧时间戳的不必要的列和行。然后，我分别使用

来计算这两列的最小值，最大值和平均值

SET @sql = N'
INSERT INTO ' + QUOTENAME(@temptableName) + N'(BillCode,
                            BillNo,
                            PatientName,
                            MobileNo,
                            Address,
                            BillDate,
                            BedFrom,
                            BedTo,
                            BedCharge,
                            OTType,
                            OTCharge,
                            OTMedicineCharge,
                            WardMedicineCharge,
                            MonitorUsed,
                            MonitorCharge,
                            OxygenUsed,
                            OxygenCharge,
                            PulltionUsed,
                            PulltionCharge,
                            ECGUsed,
                            ECGCharge,
                            PathologyCharge,
                            DressingType,
                            DressingCharge,
                            NebuligerUsed,
                            NebuligerCharge,
                            DoctorFees,
                            AnaesthisistCharge,
                            AsstOfScFees,
                            AttendentTime,
                            AttendentCharge,
                            Total,
                            OtherChargesCode,
                            GrandToTal)
VALUES (@OutBillCode, @tempBillNo, @PatientName, @MobileNo, @Address, @BillDate, @BedFrom, @BedTo, @tempBedCharge, @OTType, @OTCharge, @OTMedicineCharge, @WardMedicineCharge, @MonitorUsed, @MonitorCharge, @OxygenUsed, @OxygenCharge, @PulltionUsed, @PulltionCharge, @ECGUsed, @ECGCharge, @PathologyCharge, @DressingType, @DressingCharge, @NebuligerUsed, @NebuligerCharge, @DoctorFees, @AnaesthisistCharge, @AsstOfScFees, @AttendentTime, @AttendentCharge, @TotalCharge, @OtherChargesCode, @GrandTotal);';
        EXEC sp_executesql @sql,
                           N'@OutBillCode char(17), @tempBillNo char(7), @PatientName varchar(MAX), @MobileNo varchar(20), @Address varchar(MAX), @BillDate datetime, @BedFrom date, @BedTo date, @tempBedCharge decimal()18,2), @OTType char(3), @OTCharge decimal(18,2), @OTMedicineCharge decimal(18,2), @WardMedicineCharge decimal(18,2), @MonitorUsed int, @MonitorCharge decimal(18,2), @OxygenUsed int, @OxygenCharge decmial(18,2), @PulltionUsed int, @PulltionCharge decimal(18,2), @ECGUsed int, @ECGCharge decimal(18,2), @PathologyCharge decimal(18,2), @DressingType char(3), @DressingCharge decimal(18,2), @NebuligerUsed int, @NebuligerCharge decimal(18,2), @DoctorFees decimal(18,2), @AnaesthisistCharge decimal(18,2), @AsstOfScFees decimal(18,2), @AttendentTime int, @AttendentCharge decimal(18,2), @TotalCharge decimal(18,2), @OtherChargesCode char(5), @GrandTotal decimal(18,2)',
                           @OutBillCode =  @OutBillCode,
                           @tempBillNo =  @tempBillNo,
                           @PatientName =  @PatientName,
                           @MobileNo =  @MobileNo,
                           @Address =  @Address,
                           @BillDate =  @BillDate,
                           @BedFrom =  @BedFrom,
                           @BedTo =  @BedTo,
                           @tempBedCharge =  @tempBedCharge,
                           @OTType =  @OTType,
                           @OTCharge =  @OTCharge,
                           @OTMedicineCharge =  @OTMedicineCharge,
                           @WardMedicineCharge =  @WardMedicineCharge,
                           @MonitorUsed =  @MonitorUsed,
                           @MonitorCharge =  @MonitorCharge,
                           @OxygenUsed =  @OxygenUsed,
                           @OxygenCharge =  @OxygenCharge,
                           @PulltionUsed =  @PulltionUsed,
                           @PulltionCharge =  @PulltionCharge,
                           @ECGUsed =  @ECGUsed,
                           @ECGCharge =  @ECGCharge,
                           @PathologyCharge =  @PathologyCharge,
                           @DressingType =  @DressingType,
                           @DressingCharge =  @DressingCharge,
                           @NebuligerUsed =  @NebuligerUsed,
                           @NebuligerCharge =  @NebuligerCharge,
                           @DoctorFees =  @DoctorFees,
                           @AnaesthisistCharge =  @AnaesthisistCharge,
                           @AsstOfScFees =  @AsstOfScFees,
                           @AttendentTime =  @AttendentTime,
                           @AttendentCharge =  @AttendentCharge,
                           @TotalCharge =  @TotalCharge,
                           @OtherChargesCode =  @OtherChargesCode,
                           @GrandTotal =  @GrandTotal;

这很好。现在，我要检查数字大于上限的频率

numbercol = pd.to_numeric(df.iloc[:,0], errors='coerce')
upperlimitcol = pd.to_numeric(df.iloc[:,1], errors = 'coerce')

但是我得到一个

for dt in df.index:
     if numbercol[dt] >= upperlimitcol[dt]:
         overshoots += 1

我添加了一条打印语句，以查看每个ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().的数字和upper_limit列的值，结果发现，在1800行之后，单元格中的值不再是数字，但看起来像这样（这就是dt给我的东西）

print(numbercol[dt])

DateTime 2017-01-14 NaN 2017-01-14 3018.0 Name: Number, dtype: float64的类型也从numbercol[dt]变为<type 'numpy.float64'>

我在文本编辑器以及Libre Office和Excel中检查了文件，但看不到此行与之前的行之间的任何区别。您知道为什么会这样吗？

Answer 1

它返回一个序列，因为您有两个记录具有相同的dt。不知道问题的背景，很难说出如何进行。

一种方法是使用sum()或其他一些聚集函数（即max()，min()等）在for循环中聚合数据：

for dt in df.index:
   if numbercol[dt].sum() >= upperlimitcol[dt]:
       overshoots += 1

另一个可能是在循环之前放置dropna（）。

numbercol = numbercol.dropna()

为什么数据框的此单元格不包含值，而是一个序列？

1 个答案: