我有类似下面SampleDf的数据,我正在尝试创建代码,以便在每个字符串中选择它运行的第一个'Avg','Sum'或'Count'并将其放入一个新的列'Agg'。我下面的代码几乎可以做到,但它有一个层次结构。所以在我下面的代码中,如果Count在Sum之前,它仍然将Sum放在'Agg'列中。我有一个OutputDf,显示了我希望得到的内容。
Sample Data:
SampleDf=pd.DataFrame([['tom',"Avg(case when Value1 in ('Value2') and [DateType] in ('Value3') then LOS end)"],['bob',"isnull(Sum(case when XferToValue2 in (1) and DateType in ('Value3') and [Value1] in ('HM') then Count(LOS) end),0)"]],columns=['ReportField','OtherField'])
Sample Output:
OutputDf=pd.DataFrame([['tom',"Avg(case when Value1 in ('Value2') and [DateType] in ('Value3') then LOS end)",'Avg'],['bob',"isnull(Sum(case when XferToValue2 in (1) and DateType in ('Value3') and [Value1] in ('HM') then Count(LOS) end),0)",'Sum']],columns=['ReportField','OtherField','Agg'])
Code:
import numpy as np
SampleDf['Agg'] = np.where(SampleDf.SQLTranslation.str.contains("Sum"),"Sum",
np.where(SampleDf.SQLTranslation.str.contains("Count"),"Count",
np.where(SampleDf.SQLTranslation.str.contains("Avg"),"Avg","Nothing")))
答案 0 :(得分:1)
对此问题进行快速而肮脏的尝试将是编写一个返回的函数:
- 任何感兴趣的术语,即[&#39; Avg&#39;,&#39; Sum&#39;&#39; Count&#39;],首先发生,如果它出现在字符串中<登记/>
- 或import re
terms = ['Avg','Sum','Count']
def extractTerms(s, t=terms):
s_clean = re.sub("[^\w]|[\d]"," ", s).split()
s_array = [w for w in s_clean if w in t]
try:
return s_array[0]
except:
return None
,如果没有这样的话:
SampleDf['Agg'] = SampleDf['OtherField'].apply(lambda s: extractTerms(s))
SampleDf
ReportField OtherField Agg
0 tom Avg(case when Value1 in ('Value2') and [DateType] in ('Value3') then LOS end) Avg
1 bob isnull(Sum(case when XferToValue2 in (1) and DateType in ('Value3') and [Value1] in ('HM') then Count(LOS) end),0) Sum
证明字符串中的术语:
SampleDf['Agg'] = SampleDf['OtherField'].apply(lambda s: extractTerms(s))
SampleDf
ReportField OtherField Agg
0 tom foo None
1 bob isnull(Sum(case when XferToValue2 in (1) and DateType in ('Value3') and [Value1] in ('HM') then Count(LOS) end),0) Sum
证明条款不在字符串中
[[ 4 11 14 ..., 355 360 364]
[ 2 13 15 ..., 356 361 361]
[ 4 12 18 ..., 356 361 365]
...,
[ 6 9 17 ..., 356 362 364]
[ 1 10 19 ..., 352 357 360]
[ 1 9 17 ..., 356 358 364]]