我将CSV读入python中的数据框。我有一个DateTimeIndex和我感兴趣的两列,我们称它们为number和upper_limit。我按索引排序,删除属于较旧时间戳的不必要的列和行。然后,我分别使用
来计算这两列的最小值,最大值和平均值SET @sql = N'
INSERT INTO ' + QUOTENAME(@temptableName) + N'(BillCode,
BillNo,
PatientName,
MobileNo,
Address,
BillDate,
BedFrom,
BedTo,
BedCharge,
OTType,
OTCharge,
OTMedicineCharge,
WardMedicineCharge,
MonitorUsed,
MonitorCharge,
OxygenUsed,
OxygenCharge,
PulltionUsed,
PulltionCharge,
ECGUsed,
ECGCharge,
PathologyCharge,
DressingType,
DressingCharge,
NebuligerUsed,
NebuligerCharge,
DoctorFees,
AnaesthisistCharge,
AsstOfScFees,
AttendentTime,
AttendentCharge,
Total,
OtherChargesCode,
GrandToTal)
VALUES (@OutBillCode, @tempBillNo, @PatientName, @MobileNo, @Address, @BillDate, @BedFrom, @BedTo, @tempBedCharge, @OTType, @OTCharge, @OTMedicineCharge, @WardMedicineCharge, @MonitorUsed, @MonitorCharge, @OxygenUsed, @OxygenCharge, @PulltionUsed, @PulltionCharge, @ECGUsed, @ECGCharge, @PathologyCharge, @DressingType, @DressingCharge, @NebuligerUsed, @NebuligerCharge, @DoctorFees, @AnaesthisistCharge, @AsstOfScFees, @AttendentTime, @AttendentCharge, @TotalCharge, @OtherChargesCode, @GrandTotal);';
EXEC sp_executesql @sql,
N'@OutBillCode char(17), @tempBillNo char(7), @PatientName varchar(MAX), @MobileNo varchar(20), @Address varchar(MAX), @BillDate datetime, @BedFrom date, @BedTo date, @tempBedCharge decimal()18,2), @OTType char(3), @OTCharge decimal(18,2), @OTMedicineCharge decimal(18,2), @WardMedicineCharge decimal(18,2), @MonitorUsed int, @MonitorCharge decimal(18,2), @OxygenUsed int, @OxygenCharge decmial(18,2), @PulltionUsed int, @PulltionCharge decimal(18,2), @ECGUsed int, @ECGCharge decimal(18,2), @PathologyCharge decimal(18,2), @DressingType char(3), @DressingCharge decimal(18,2), @NebuligerUsed int, @NebuligerCharge decimal(18,2), @DoctorFees decimal(18,2), @AnaesthisistCharge decimal(18,2), @AsstOfScFees decimal(18,2), @AttendentTime int, @AttendentCharge decimal(18,2), @TotalCharge decimal(18,2), @OtherChargesCode char(5), @GrandTotal decimal(18,2)',
@OutBillCode = @OutBillCode,
@tempBillNo = @tempBillNo,
@PatientName = @PatientName,
@MobileNo = @MobileNo,
@Address = @Address,
@BillDate = @BillDate,
@BedFrom = @BedFrom,
@BedTo = @BedTo,
@tempBedCharge = @tempBedCharge,
@OTType = @OTType,
@OTCharge = @OTCharge,
@OTMedicineCharge = @OTMedicineCharge,
@WardMedicineCharge = @WardMedicineCharge,
@MonitorUsed = @MonitorUsed,
@MonitorCharge = @MonitorCharge,
@OxygenUsed = @OxygenUsed,
@OxygenCharge = @OxygenCharge,
@PulltionUsed = @PulltionUsed,
@PulltionCharge = @PulltionCharge,
@ECGUsed = @ECGUsed,
@ECGCharge = @ECGCharge,
@PathologyCharge = @PathologyCharge,
@DressingType = @DressingType,
@DressingCharge = @DressingCharge,
@NebuligerUsed = @NebuligerUsed,
@NebuligerCharge = @NebuligerCharge,
@DoctorFees = @DoctorFees,
@AnaesthisistCharge = @AnaesthisistCharge,
@AsstOfScFees = @AsstOfScFees,
@AttendentTime = @AttendentTime,
@AttendentCharge = @AttendentCharge,
@TotalCharge = @TotalCharge,
@OtherChargesCode = @OtherChargesCode,
@GrandTotal = @GrandTotal;
这很好。现在,我要检查数字大于上限的频率
numbercol = pd.to_numeric(df.iloc[:,0], errors='coerce')
upperlimitcol = pd.to_numeric(df.iloc[:,1], errors = 'coerce')
但是我得到一个
for dt in df.index:
if numbercol[dt] >= upperlimitcol[dt]:
overshoots += 1
我添加了一条打印语句,以查看每个ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
的数字和upper_limit列的值,结果发现,在1800行之后,单元格中的值不再是数字,但看起来像这样(这就是dt
给我的东西)
print(numbercol[dt])
DateTime
2017-01-14 NaN
2017-01-14 3018.0
Name: Number, dtype: float64
的类型也从numbercol[dt]
变为<type 'numpy.float64'>
我在文本编辑器以及Libre Office和Excel中检查了文件,但看不到此行与之前的行之间的任何区别。您知道为什么会这样吗?
答案 0 :(得分:0)
它返回一个序列,因为您有两个记录具有相同的dt
。不知道问题的背景,很难说出如何进行。
一种方法是使用sum()
或其他一些聚集函数(即max()
,min()
等)在for循环中聚合数据:
for dt in df.index:
if numbercol[dt].sum() >= upperlimitcol[dt]:
overshoots += 1
另一个可能是在循环之前放置dropna()。
numbercol = numbercol.dropna()