Python Scrapy unicode比较字符串土耳其语字符

时间:2017-10-29 14:47:56

标签: python string unicode scrapy compare

我多次研究过这个问题,但我找不到明确的答案。请帮助我如何比较两个unicode字符串。我希望从这种格式“17Ağustos2017”或“11Eylül2017”获得日期,我尝试转换为此格式“17-08-2017”,“11-9-2017”。但是当我得到月份字符串时它返回

  

“Ağustos”> “A \ xc4 \ x9fustos”,“Eylül”> “EYL \ xfcl”

    months = ['Ocak', '\xc5\x9eubat', 'Mart', 'Nisan', 
               'May\xc4\xb1s', 'Haziran', 'Temmuz', 
              'A\xc4\x9fustos', 'Eyl\xfcl', 'Ekim', 
                'Kas\xc4\xb1m', 'Aral\xc4\xb1k'
             ]

                month= valuesDetails[indexDate].split(" ")

                if int(months.index(month[1])+1 < 10): # month
                    month[-2]= "0"+str(months.index(month[1])+1)
                else:
                    month[-2]= str(months.index(month[1])+1)
                if int(month[0]) < 10: # day
                    mont[0] = "0"+month[0]


                item['date'] = month[0]+"-"+month[1]+"-"+month[2]

2 个答案:

答案 0 :(得分:0)

更简单的方法是使用Python datetime&amp;我们知道Turkish's locale codelocale和一些datetime maskingtr_TR个图书馆。

#coding:utf8
from datetime import datetime
import locale

# Datetime is aware of locale,
# change locale to Turkish
locale.setlocale(locale.LC_TIME, "tr_TR")

dates = ['17 Ağustos 2017','11 Eylül 2017']

for date in dates:
    # Make `date` str a datetime object
    # using a datetime mask
    dt = datetime.strptime(date, '%d %B %Y')
    # Convert `dt` datetime object to
    # str in preferred format using a
    # datetime mask
    dt_str = dt.strftime('%d-%m-%Y')
    # Walla!
    print(dt_str)

输出:

17-08-2017
11-09-2017

答案 1 :(得分:0)

在不更改区域设置的情况下,您可以创建相应月份的全局字典,并将土耳其月份替换为相应的英语月份:

#coding:utf8
from datetime import datetime

MONTHS = {
    'Ocak': 'January',
    'Şubat': 'February',
    'Mart': 'March',
    'Nisan': 'April',
    'Mayıs': 'May',
    'Haziran': 'June',
    'Temmuz': 'July',
    'Ağustos': 'August',
    'Eylül': 'September',
    'Ekim': 'October',
    'Kasım': 'November',
    'Aralık': 'Decemeber'
}

def format_date(date):
    # Iterate through months and grab
    # respective turkish and english
    # month
    for tr_month, eng_month in MONTHS.items():
        # Replace turkish month (if found)
        # with english month
        if tr_month in date:
            print("'%s' > %s" % (tr_month, repr(tr_month)))
            date = date.replace(tr_month, eng_month)
            break
    # Convert date to datetime object and
    # back into the preferred format
    return datetime.strptime(date,'%d %B %Y').strftime('%d-%m-%Y')

for date in ['17 Ağustos 2017','11 Eylül 2017']:
    print(format_date(date))

输出:

'Ağustos' > 'A\xc4\x9fustos'
17-08-2017
'Eylül' > 'Eyl\xc3\xbcl'
11-09-2017

我假设我的月份正确,但我不懂土耳其语,所以你可能要仔细检查一下。