选择除重复列以外的所有列,除非列的字符串长度最长

时间:2018-05-09 23:54:16

标签: sql sql-server performance join select

我有一个包含以下列的表:

strWord,strWordType,strWordDescription

我希望能够选择除了存在重复strWordDescription的行之外的所有行。在重复的情况下,我只想返回strWord长度最长的行。这只应在strWordType相同时生效。

注意:没有重复的strWords / strWordType组合行只复制特定strWordTypes的strWordDescriptions。我想避免使用Distinct

示例: myTable

  strWord |    strWordType  |   strWordDescription |

  blue         2012               This is a color
  blue         2014               This is a color
  green        2012               This is a color
  ham          2014               This is a food
  chicken      2014               This is a food

预期结果:

   strWord  |   strWordType   | strWordDescription

   green        2012            This is a color
   blue         2014            This is a color
   chicken      2014            This is a food

2 个答案:

答案 0 :(得分:0)

嗯。嗯。 。 。想到一个相关的子查询:

import pandas as pd

StartDate=['2016-01-01','2016-01-13','2016-01-25','2016-02-06','2016-02-18']
EndDate=['2016-01-12','2016-01-24','2016-01-05','2016-02-17','2016-02-29']
value_3=[1,2,3,4,5]

Date=['2016-01-01','2016-01-02','2016-02-10','2016-02-11','2016-02-18']
value_1=[3,4,5,6,7]
value_2=[0,1,3,5,7]

df1=pd.DataFrame({'StartDate':StartDate,'EndDate':EndDate,'Value_3':value_3})
df2=pd.DataFrame({'Date':Date,'Value_1':value_1,'Value_2':value_2})

df1['EndDate']=pd.to_datetime(df1['EndDate'])
df1['StartDate']=pd.to_datetime(df1['StartDate'])
df2['Date']=pd.to_datetime(df2['Date'])

答案 1 :(得分:0)

刚刚解决了 -

SELECT MAX(mt.strWord),
       mt.strWordType,
       mt.strWordDescription


FROM myTable mt
GROUP BY mt.strWordType, mt.strWordDescription
ORDER BY MAX(mt.strWord)