字符串切片在python UDF中

时间:2016-04-29 17:25:48

标签: python arrays string python-2.x udf

我正在尝试在python中编写一个将从pig脚本调用的UDF。 UDF需要以DD-MMM-YYYY格式接受日期作为字符串并返回DD-MM-YYYY格式。这里MMM将像JAN,FEB .. DEC和返回MM将是01,02 ... 12。

下面是我的python UDF

#!/usr/bin/python

@outputSchema("newdate:chararray")
def GetMonthMM(inputString):
    print inputString
    #monthstring = inputString[3:6]
    sl = slice(3,6)
    monthstring = inputString[sl]
    monthdigit = ""

    if ( monthstring == "JAN" ):
        monthdigit = "01"
    elif ( monthstring == "FEB"):
        monthdigit = "02"
    elif(monthstring == "MAR"):
        monthdigit = "03"
    elif(monthstring == "APR"):
        monthdigit = "04"
    elif(monthstring == "MAY"):
        monthdigit = "05"
    elif (monthstring == "JUN"):
        monthdigit = "06"
    elif (monthstring == "JUL"):
        monthdigit = "07"
    elif (monthstring == "AUG"):
        monthdigit = "08"
    elif (monthstring == "SEP"):
        monthdigit = "09"
    elif (monthstring == "OCT"):
        monthdigit = "10"
    elif (monthstring == "NOV"):
        monthdigit = "11"
    elif (monthstring == "DEC"):
        monthdigit = "12"

    sl1 = slice(0,3)
    sl2 = slice(6,11)
    str1 = inputString[sl1]
    str2 = inputString[sl2]

    newdate = str1 + monthdigit + str2
    return monthstring;

我做了一些调试,问题似乎是切片后字符串被视为数组。我收到以下错误消息

TypeError: unsupported operand type(s) for +: 'array.array' and 'str'

即使将字符串与if(monthstring ==“DEC”)之类的另一个字符串进行比较,也会发生同样的情况: 即使monthtring以DEC为值,条件也永远不会满足。

以前有人遇到过同样的问题吗?任何想法如何解决这个问题。

2 个答案:

答案 0 :(得分:1)

我会写这个函数:

#!/usr/bin/python
@outputSchema("newdate:chararray")
def GetMonthMM(inputString):
    monthArray = {'JAN':'01','FEB':'02','MAR':'03','APR':'04','MAY':'05','JUN':'06','JUL':'07','AUG':'08','SEP':'09','OCT':'10','NOV':'11','DEC':'12'}
    print inputString
    #monthstring = inputString[3:6]
    dateparts = string.join(inputString).split('-') #assuming the date is always separated by -
    dateparts[1] = monthArray[dateparts[1]]
    return dateparts.join('-');

答案 1 :(得分:1)

最近我使用了calendar模块,在不同的情况下可能更有用,但无论如何。

import calendar
m_dict = {}
for i, month in enumerate(calendar.month_abbr[1:]): #for some reason month_abbr[0] = '', so ommit that
    m_dict[month.lower()] = '{:02}'.format(i+1)

def GetMonthMM(inputStr):
    day, month, year = inputStr.split('-')
    return '-'.join([day, m_dict[month.lower()], year])

print(GetMonthMM('01-JAN-2015'))
# prints 01-01-2015