早上好, 我正在尝试读取列表并使用datetime格式识别字符串。以下是我的列表示例(包含日期时间或版本号):
现在我想将2014-12-01更改为2014年12月1日。它可以正常使用此代码:
dfnew[1].iloc[i]=datetime.datetime.strptime(dfnew[1].iloc[i], "%Y-%m-%d").strftime("%d.%m.%Y")
但如果我在这里使用此代码:
for i, x in dfnew.iterrows():
print (dfnew[1].iloc[i])
if ( (dfnew[1].iloc[i]) == (datetime.datetime.strptime(dfnew[1].iloc[i], "%YY-%mm-%d")) ):
dfnew[1].iloc[i]=datetime.datetime.strptime(dfnew[1].iloc[i], "%Y-%m-%d").strftime("%d.%m.%Y")
我收到错误:
Traceback (most recent call last):
File "CVE_List.py", line 74, in <module>
if ( (dfnew[1].iloc[i]) == (datetime.datetime.strptime(dfnew[1].i
File "C:\Program Files (x86)\Python36-32\lib\_strptime.py", line 56
tt, fraction = _strptime(data_string, format)
File "C:\Program Files (x86)\Python36-32\lib\_strptime.py", line 36
(data_string, format))
ValueError: time data '1.0.1' does not match format '%Y-%m-%d'
但通常必须忽略格式为x.y.z的行。
答案 0 :(得分:1)
如果没有必要,请不要使用iterrows
,因为非常慢。更好的是使用熊猫矢量化功能。
最好使用带有参数errors='coerce'
的{{3}},以便将不可解析的日期时间转换为NaT
。
如果需要转换为自定义字符串格式,请添加to_datetime
:
print (df)
0 1
0 String 1.0.1
1 String2 2017-10-03
2 String3 2014-12-01
3 String4 01.11.2017
df[1] = pd.to_datetime(df[1], errors='coerce')
print (df)
0 1
0 String NaT
1 String2 2017-10-03
2 String3 2014-12-01
3 String4 2017-01-11
编辑:
一般解决方案需要dt.strftime
个日期year
,日期不包含years
,日期all possible formats需要df1
个数据框。每行最后一行删除NaN
并按concat
添加versions
:
#remove trailing whitespaces
c = df[1].str.strip()
dates_formats_with_years = ['%Y-%m-%d','%d.%m.%Y']
d1 = [pd.to_datetime(c, errors='coerce', format=x) for x in dates_formats_with_years]
s1 = [s.dt.strftime("%#d.%#m")
.add('.')
.add(s.dt.strftime("%y").str.lstrip('0'))
.str.rstrip('.')
.replace('NaT.NaT', np.nan) for s in d1]
dates_formats_no_years = ['%d.%b']
d2 = [pd.to_datetime(c, errors='coerce', format=x) for x in dates_formats_no_years]
s2 = [s.dt.strftime("%#d.%#m").replace('NaT', np.nan) for s in d2]
df1 = pd.concat(s1 + s2, axis=1, keys=range(len(s1) + len(s2)))
print (df1)
0 1 2
0 NaN NaN NaN
1 NaN NaN NaN
2 3.5.29 NaN NaN
3 2.8.17 NaN NaN
4 NaN NaN NaN
5 5.6 NaN NaN
6 3.4.7 NaN NaN
7 NaN NaN NaN
8 7.6.17 NaN NaN
9 4.1.1 NaN NaN
10 1.8.3 NaN NaN
11 NaN NaN NaN
12 NaN NaN NaN
13 NaN NaN NaN
14 1.2.8 NaN NaN
df[1] = df1.apply(lambda x: x.sort_values().iat[0], axis=1).combine_first(c)
print (df)
0 1
0 alsa-intf UMC-9240P_D_v0.30
1 alsa-lib 1.0.29
2 base-passwd 3.5.29
3 sshfs-fuse 2.8.17
4 start-scripts-firmware-links 1.0
5 s 5.6
6 syslog-ng 3.4.7
7 system-conf UMC-9240P_D_v0.30
8 tcp-wrappers 7.6.17
9 tcpdump 4.1.1
10 tinyproxy 1.8.3
11 tinyscheme 1.0.0
12 useradd-peiker 1.0
13 util-linux 2.26.2
14 zlib 1.2.8