获取值索引(有条件)

时间:2019-02-21 23:47:48

标签: python python-3.x pandas dataframe pandas-groupby

我尝试了其他选项,但是我总是返回到CREATE TABLE Faculty ( FacNo CHAR(11) NOT NULL, FacFirstName VARCHAR(30) NOT NULL, FacLastName VARCHAR(30) NOT NULL, FacCity VARCHAR(30) NOT NULL, FacState CHAR(2) NOT NULL, FacDept CHAR(6) NULL, FacRank CHAR(4) NULL, FacSalary DECIMAL(10,2) NULL, FacSupervisor CHAR(11) NULL, FacHireDate DATE NULL, FacZipCode CHAR(10) NOT NULL, CONSTRAINT FacultyPK PRIMARY KEY (FacNo), CONSTRAINT SupervisorFK FOREIGN KEY (FacSupervisor) REFERENCES Faculty ); CREATE TABLE Course ( CourseNo CHAR(6) NOT NULL, CrsDesc VARCHAR(50) NOT NULL, CrsUnits INTEGER NULL, CONSTRAINT CoursePK PRIMARY KEY (CourseNo) ); CREATE TABLE Offering ( OfferNo INTEGER NOT NULL, CourseNo CHAR(6) NOT NULL, OffTerm CHAR(6) NOT NULL, OffYear INTEGER NOT NULL, OffLocation VARCHAR(30) NULL, OffTime VARCHAR(10) NULL, FacNo CHAR(11) NULL, OffDays CHAR(4) NULL, CONSTRAINT OfferingPK PRIMARY KEY (OfferNo), CONSTRAINT CourseFK FOREIGN KEY (CourseNo) REFERENCES Course, CONSTRAINT FacultyFK FOREIGN KEY (FacNo) REFERENCES Faculty ); CREATE TABLE Student ( StdNo CHAR(11) NOT NULL, StdFirstName VARCHAR(30) NOT NULL, StdLastName VARCHAR(30) NOT NULL, StdCity VARCHAR(30) NOT NULL, StdState CHAR(2) NOT NULL, StdZip CHAR(10) NOT NULL, StdMajor CHAR(6) NULL, StdClass CHAR(2) NULL, StdGPA DECIMAL(3,2) NULL, CONSTRAINT StudentPk PRIMARY KEY (StdNo) ); CREATE TABLE Enrollment( OfferNo INTEGER NOT NULL, StdNo CHAR(11) NOT NULL, EnrGrade DECIMAL(3,2) NULL, CONSTRAINT EnrollmentPK PRIMARY KEY (OfferNo,StdNo), CONSTRAINT OfferingFK FOREIGN KEY (OfferNo) REFERENCES Offering ON DELETE CASCADE, CONSTRAINT StudentFK FOREIGN KEY (StdNo) REFERENCES Student ON DELETE CASCADE ); INSERT INTO Faculty VALUES ('543210987','VICTORIA','EMMANUEL','BOTHELL','WA','MS','PROF',120000.0,NULL,'15/Apr/1998','98011-2242'); INSERT INTO Faculty VALUES ('654321098','LEONARD','FIBON','SEATTLE','WA','MS','ASSC',70000.00,'543210987','1/May/1996','98121-0094'); INSERT INTO Faculty VALUES ('098765432','LEONARD','VINCE','SEATTLE','WA','MS','ASST',35000.00,'654321098','10/Apr/1997','98111-9921'); INSERT INTO Faculty VALUES ('765432109','NICKI','MACON','BELLEVUE','WA','FIN','PROF',65000.00,NULL,'11/Apr/1999','98015-9945'); INSERT INTO Faculty VALUES ('876543210','CRISTOPHER','COLAN','SEATTLE','WA','MS','ASST',40000.00,'654321098','1/Mar/2001','98114-1332'); INSERT INTO Faculty VALUES ('987654321','JULIA','MILLS','SEATTLE','WA','FIN','ASSC',75000.00,'765432109','15/Mar/2002','98114-9954'); INSERT INTO Course VALUES ('FIN300','FUNDAMENTALS OF FINANCE',4); INSERT INTO Course VALUES ('FIN450','PRINCIPLES OF INVESTMENTS',4); INSERT INTO Course VALUES ('FIN480','CORPORATE FINANCE',4); INSERT INTO Course VALUES ('IS320','FUNDAMENTALS OF BUSINESS PROGRAMMING',4); INSERT INTO Course VALUES ('IS460','SYSTEMS ANALYSIS',4); INSERT INTO Course VALUES ('IS470','BUSINESS DATA COMMUNICATIONS',4); INSERT INTO Course VALUES ('IS480','FUNDAMENTALS OF DATABASE MANAGEMENT',4); INSERT INTO Offering VALUES (1111,'IS320','SUMMER',2010,'BLM302','10:30:00',NULL,'MW'); INSERT INTO Offering VALUES (1234,'IS320','FALL',2009,'BLM302','10:30:00','098765432','MW'); INSERT INTO Offering VALUES (2222,'IS460','SUMMER',2009,'BLM412','13:30:00',NULL,'TTH'); INSERT INTO Offering VALUES (3333,'IS320','SPRING',2010,'BLM214','8:30:00','098765432','MW'); INSERT INTO Offering VALUES (4321,'IS320','FALL',2009,'BLM214','15:30:00','098765432','TTH'); INSERT INTO Offering VALUES (4444,'IS320','WINTER',2010,'BLM302','15:30:00','543210987','TTH'); INSERT INTO Offering VALUES (5555,'FIN300','WINTER',2010,'BLM207','8:30:00','765432109','MW'); INSERT INTO Offering VALUES (5678,'IS480','WINTER',2010,'BLM302','10:30:00','987654321','MW'); INSERT INTO Offering VALUES (5679,'IS480','SPRING',2010,'BLM412','15:30:00','876543210','TTH'); INSERT INTO Offering VALUES (6666,'FIN450','WINTER',2010,'BLM212','10:30:00','987654321','TTH'); INSERT INTO Offering VALUES (7777,'FIN480','SPRING',2010,'BLM305','13:30:00','765432109','MW'); INSERT INTO Offering VALUES (8888,'IS320','SUMMER',2010,'BLM405','13:30:00','654321098','MW'); INSERT INTO Offering VALUES (9876,'IS460','SPRING',2010,'BLM307','13:30:00','654321098','TTH'); INSERT INTO Student VALUES ('123456789','HOMER','WELLS','SEATTLE','WA','98121-1111','IS','FR',3.00); INSERT INTO Student VALUES ('124567890','BOB','NORBERT','BOTHELL','WA','98011-2121','FIN','JR',2.70); INSERT INTO Student VALUES ('234567890','CANDY','KENDALL','TACOMA','WA','99042-3321','ACCT','JR',3.50); INSERT INTO Student VALUES ('345678901','WALLY','KENDALL','SEATTLE','WA','98123-1141','IS','SR',2.80); INSERT INTO Student VALUES ('456789012','JOE','ESTRADA','SEATTLE','WA','98121-2333','FIN','SR',3.20); INSERT INTO Student VALUES ('567890123','MARIAH','DODGE','SEATTLE','WA','98114-0021','IS','JR',3.60); INSERT INTO Student VALUES ('678901234','TESS','DODGE','REDMOND','WA','98116-2344','ACCT','SO',3.30); INSERT INTO Student VALUES ('789012345','ROBERTO','MORALES','SEATTLE','WA','98121-2212','FIN','JR',2.50); INSERT INTO Student VALUES ('876543210','CRISTOPHER','COLAN','SEATTLE','WA','98114-1332','IS','SR',4.00); INSERT INTO Student VALUES ('890123456','LUKE','BRAZZI','SEATTLE','WA','98116-0021','IS','SR',2.20); INSERT INTO Student VALUES ('901234567','WILLIAM','PILGRIM','BOTHELL','WA','98113-1885','IS','SO',3.80); INSERT INTO Enrollment VALUES (1234,'123456789',3.30); INSERT INTO Enrollment VALUES (1234,'234567890',3.50); INSERT INTO Enrollment VALUES (1234,'345678901',3.20); INSERT INTO Enrollment VALUES (1234,'456789012',3.10); INSERT INTO Enrollment VALUES (1234,'567890123',3.80); INSERT INTO Enrollment VALUES (1234,'678901234',3.40); INSERT INTO Enrollment VALUES (4321,'123456789',3.50); INSERT INTO Enrollment VALUES (4321,'124567890',3.20); INSERT INTO Enrollment VALUES (4321,'789012345',3.50); INSERT INTO Enrollment VALUES (4321,'876543210',3.10); INSERT INTO Enrollment VALUES (4321,'890123456',3.40); INSERT INTO Enrollment VALUES (4321,'901234567',3.10); INSERT INTO Enrollment VALUES (5555,'123456789',3.20); INSERT INTO Enrollment VALUES (5555,'124567890',2.70); INSERT INTO Enrollment VALUES (5678,'123456789',3.20); INSERT INTO Enrollment VALUES (5678,'234567890',2.80); INSERT INTO Enrollment VALUES (5678,'345678901',3.30); INSERT INTO Enrollment VALUES (5678,'456789012',3.40); INSERT INTO Enrollment VALUES (5678,'567890123',2.60); INSERT INTO Enrollment VALUES (5679,'123456789',2.00); INSERT INTO Enrollment VALUES (5679,'124567890',3.70); INSERT INTO Enrollment VALUES (5679,'678901234',3.30); INSERT INTO Enrollment VALUES (5679,'789012345',3.80); INSERT INTO Enrollment VALUES (5679,'890123456',2.90); INSERT INTO Enrollment VALUES (5679,'901234567',3.10); INSERT INTO Enrollment VALUES (6666,'234567890',3.10); INSERT INTO Enrollment VALUES (6666,'567890123',3.60); INSERT INTO Enrollment VALUES (7777,'876543210',3.40); INSERT INTO Enrollment VALUES (7777,'890123456',3.70); INSERT INTO Enrollment VALUES (7777,'901234567',3.40); INSERT INTO Enrollment VALUES (9876,'124567890',3.50); INSERT INTO Enrollment VALUES (9876,'234567890',3.20); INSERT INTO Enrollment VALUES (9876,'345678901',3.20); INSERT INTO Enrollment VALUES (9876,'456789012',3.40); INSERT INTO Enrollment VALUES (9876,'567890123',2.60); INSERT INTO Enrollment VALUES (9876,'678901234',3.30); INSERT INTO Enrollment VALUES (9876,'901234567',4.00); 函数。我有一个大数据框,需要找到值.get_locnearest的行索引。 df看起来像这样:

backfill

Date Product Price 0 1/1 NEG 3 1 1/1 NEG 3.3 2 1/1 NEG 5.1 3 1/1 POS 1.4 4 1/1 POS 3.7 5 1/1 POS 3.9 6 1/1 POS 4.6 7 1/2 NEG 1.2 8 ... ... ... 给我df.columns.get_loc('Price')作为“价格”列的索引,但是我需要按节(“日期”和“产品”)按特殊行的索引,例如:

2

现在,搜索价格== 3.4

df.loc[(df)['Date']=='1/1' & (df['Product']=='NEG')]

这将给我index = 1,但由于数据过大,存在多个“ 3.4”,因此无法正常工作。

有什么方法可以在某些条件下(例如上述条件)搜索最近的值?

1 个答案:

答案 0 :(得分:0)

欢迎Stackoverflow!

我不喜欢使用.get_loc(),所以这是一种获取所需内容的替代方法。

import pandas as pd

num = 3.4

# New dataframe fit_criteria for conditions (df['Date']=='1/1') & (df['Product']=='NEG')
fit_criteria = df.loc[(df['Date']=='1/1') & (df['Product']=='NEG')]

# Find absolute difference between values in price column and num. Find the index of
# the smallest difference using .idxmin()
nearest_to_num = (fit_criteria['Price']-num).abs().idxmin()

# Final result is the index of nearest number to num
nearest_to_num

如果评论不够,这是关于发生的事情的更详细的解释:

  1. 首先,我们定义要查找与之最接近的数字
    num = 3.4
    
  2. 接下来,我们创建一个符合Date = 1/1准则的数据框 和Product = Neg,方法是将它们作为条件传递给.loc[]

    fit_criteria = df.loc[(df['Date']=='1/1') & (df['Product']=='NEG')]
    
  3. 然后我们生成一个数据帧,其之间存在绝对差 num和列price中的值。最后,.idxmin() 方法用于返回第一个最小值的索引

    nearest_to_num = (fit_criteria['Price']-num).abs().idxmin()
    
  4. 最后,nearest_to_num的值为1, 对应于所需行的索引。

请注意,这种方法并未考虑多个同样接近num的值。我希望这能充分回答您的问题,但是如果您需要更多细节或澄清,请随时告诉我。


使用的参考: How do I find the closest values in a Pandas series to an input number?