Question

我有一个包含以下列的表：

strWord，strWordType，strWordDescription

我希望能够选择除了存在重复strWordDescription的行之外的所有行。在重复的情况下，我只想返回strWord长度最长的行。这只应在strWordType相同时生效。

注意：没有重复的strWords / strWordType组合行只复制特定strWordTypes的strWordDescriptions。我想避免使用Distinct。

示例： myTable

  strWord |    strWordType  |   strWordDescription |

  blue         2012               This is a color
  blue         2014               This is a color
  green        2012               This is a color
  ham          2014               This is a food
  chicken      2014               This is a food

预期结果：

   strWord  |   strWordType   | strWordDescription

   green        2012            This is a color
   blue         2014            This is a color
   chicken      2014            This is a food

Answer 1

嗯。嗯。。。想到一个相关的子查询：

import pandas as pd

StartDate=['2016-01-01','2016-01-13','2016-01-25','2016-02-06','2016-02-18']
EndDate=['2016-01-12','2016-01-24','2016-01-05','2016-02-17','2016-02-29']
value_3=[1,2,3,4,5]

Date=['2016-01-01','2016-01-02','2016-02-10','2016-02-11','2016-02-18']
value_1=[3,4,5,6,7]
value_2=[0,1,3,5,7]

df1=pd.DataFrame({'StartDate':StartDate,'EndDate':EndDate,'Value_3':value_3})
df2=pd.DataFrame({'Date':Date,'Value_1':value_1,'Value_2':value_2})

df1['EndDate']=pd.to_datetime(df1['EndDate'])
df1['StartDate']=pd.to_datetime(df1['StartDate'])
df2['Date']=pd.to_datetime(df2['Date'])

Answer 2

刚刚解决了 -

SELECT MAX(mt.strWord),
       mt.strWordType,
       mt.strWordDescription


FROM myTable mt
GROUP BY mt.strWordType, mt.strWordDescription
ORDER BY MAX(mt.strWord)

选择除重复列以外的所有列，除非列的字符串长度最长

2 个答案: