目前我正在尝试解析大型xml文件,以下是我的xml文件的样子:
<post>
<row Id="22" PostTypeId="2" ParentId="9" CreationDate="2008-08-01T12:07:19.500" Score="7" Body="<p>The best way that I know of because of leap years and everything is:</p>

<pre><code>DateTime birthDate = new DateTime(2000,3,1);<br>int age = (int)Math.Floor((DateTime.Now - birthDate).TotalDays / 365.25D);<br></code></pre>

<p>Hope this helps.</p>" OwnerUserId="17" LastEditorUserId="17" LastEditorDisplayName="Nick" LastEditDate="2008-08-01T15:26:37.087" LastActivityDate="2008-08-01T15:26:37.087" CommentCount="1" CommunityOwnedDate="2011-08-16T19:40:43.080" />
<row Id="29" PostTypeId="2" ParentId="13" CreationDate="2008-08-01T12:19:17.417" Score="18" Body="<p>There are no HTTP headers that will report the clients timezone so far although it has been suggested to include it in the HTTP specification.</p>

<p>If it was me, I would probably try to fetch the timezone using clientside JavaScript and then submit it to the server using Ajax or something.</p>" OwnerUserId="19" LastActivityDate="2008-08-01T12:19:17.417" CommentCount="0" />
</post>
此XML文件中的这两个记录之间的差异是没有LastEditDate元素。我相信因此我得到以下错误:
/ruby/1.9.2/ubuntuamd1/lib/ruby/1.9.1/date/format.rb:1031:in `dup': can't dup NilClass (TypeError)
from /soft/ruby/1.9.2/ubuntuamd1/lib/ruby/1.9.1/date/format.rb:1031:in `_parse'
from /soft/ruby/1.9.2/ubuntuamd1/lib/ruby/1.9.1/date.rb:1732:in `parse'
from load.rb:105:in `on_start_element'
from load.rb:165:in `parse'
以下是其引用的代码段:
if element == 'row'
@post_st.execute(attributes['Id'], attributes['PostTypeId'], attributes['AcceptedAnswerId'], attributes['ParentId'], attributes['Score'], attributes['ViewCount'],
attributes['Body'], attributes['OwnerUserId'] == nil ? -1 : attributes['OwnerUserId'], attributes['LastEditorUserId'], attributes['LastEditorDisplayName'],
DateTime.parse(attributes['LastEditDate']).to_time.strftime("%F %T"), DateTime.parse(attributes['LastActivityDate']).to_time.strftime("%F %T"), attributes['Title'] == nil ? '' : attributes['Title'],
attributes['AnswerCount'] == nil ? 0 : attributes['AnswerCount'], attributes['CommentCount'] == nil ? 0 : attributes['CommentCount'],
attributes['FavoriteCount'] == nil ? 0 : attributes['FavoriteCount'], DateTime.parse(attributes['CreationDate']).to_time.strftime("%F %T"))
post_id = attributes['Id']
此外,我认为这是我寻找LastEditDate
DateTime.parse(attributes['LastEditDate']).to_time.strftime("%F %T"), DateTime.parse(attributes['LastActivityDate']).to_time.strftime("%F %T"), attributes['Title'] == nil ? '' : attributes['Title']
我猜因为元素不存在我得到了上面提到的错误。我想知道如何处理这种情况,如果元素不存在,则将其设置为默认值。因为在解析这些记录时我将它们插入到MySQL数据库中。其中有以下表结构:
+--------------------------+--------------+------+-----+---------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------+--------------+------+-----+---------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | |
| post_type_id | int(11) | NO | | NULL | |
| accepted_answer_id | int(11) | YES | | NULL | |
| parent_id | int(11) | YES | MUL | NULL | |
| score | int(11) | YES | | NULL | |
| view_count | int(11) | YES | | NULL | |
| body_text | text | YES | | NULL | |
| owner_id | int(11) | NO | | NULL | |
| last_editor_user_id | int(11) | YES | | NULL | |
| last_editor_display_name | varchar(40) | YES | | NULL | |
| last_edit_date | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| last_activity_date | timestamp | NO | | 0000-00-00 00:00:00 | |
| title | varchar(256) | NO | | NULL | |
| answer_count | int(11) | NO | | NULL | |
| comment_count | int(11) | NO | | NULL | |
| favorite_count | int(11) | NO | | NULL | |
| created | timestamp | NO | | 0000-00-00 00:00:00 | |
+--------------------------+--------------+------+-----+---------------------+-----------------------------+
我已将last_edit_date设置为非空列。
根据提供的答案,我做了更改,但错误仍然保持不变:
def convert_to_mysql_time(date='1973-01-01T01:01:01.000')
DateTime.parse(date).to_time.strftime("%F %T")
end
def on_start_element(element, attributes)
if element == 'row'
@post_st.execute(attributes['Id'], attributes['PostTypeId'], attributes['AcceptedAnswerId'], attributes['ParentId'], attributes['Score'], attributes['ViewCount'],
attributes['Body'], attributes['OwnerUserId'] == nil ? -1 : attributes['OwnerUserId'], attributes['LastEditorUserId'], attributes['LastEditorDisplayName'],
convert_to_mysql_time(attributes['LastEditDate']), DateTime.parse(attributes['LastActivityDate']).to_time.strftime("%F %T"), attributes['Title'] == nil ? '' : attributes['Title'],
attributes['AnswerCount'] == nil ? 0 : attributes['AnswerCount'], attributes['CommentCount'] == nil ? 0 : attributes['CommentCount'],
attributes['FavoriteCount'] == nil ? 0 : attributes['FavoriteCount'], DateTime.parse(attributes['CreationDate']).to_time.strftime("%F %T"))
post_id = attributes['Id']
这是错误:
/ruby/1.9.2/ubuntuamd1/lib/ruby/1.9.1/date/format.rb:1031:in `dup': can't dup NilClass (TypeError)
from /soft/ruby/1.9.2/ubuntuamd1/lib/ruby/1.9.1/date/format.rb:1031:in `_parse'
from /soft/ruby/1.9.2/ubuntuamd1/lib/ruby/1.9.1/date.rb:1732:in `parse'
from load.rb:102:in `convert_to_mysql_time'
from load.rb:109:in `on_start_element'
from load.rb:169:in `parse'
from load.rb:169:in `<main>'
答案 0 :(得分:2)
我会写一个方法,将String的日期转换为MySQL
个日期,如果nil提供给方法,则提供一个默认值,例如:
def convert_to_my_sql_date(date)
date = '1973-01-01T01:01:01.000' if (date.empty? rescue true) #was added since empty string gets supplied as an argument, and the rescue to make arguments that do not respond to empty? take a default date
DateTime.parse(date).to_time.strftime("%F %T")
end
因此,当日期为零时,它使用默认值,那么您现在可以在方法中使用如下所示:
convert_to_my_sql_date(attributes['LastEditDate'])