我使用Watir从网站上搜索搜索结果并将其输入CSV文件。当我运行搜索时,结果将分为跨度类。所以HTML看起来像:
<span class="sn_auth_name">foo</span>
<span class="sn_target_lang">English</span>
我的代码如下:
sn_auth_name = row.xpath('span[@class="sn_auth_name"]/text()').text.strip
sn_target_lang = row.xpath('span[@class="sn_target_lang"]/text()').text.strip
CSV.open("file.csv", "a") do |csv|
csv << [sn_auth_name, sn_target_lang]
问题在于,对于某些搜索结果,有多个项目分配给同一个类。也就是说,有时只有一个sn_auth_name
,有时只有三个<table class="restable"><tr>
<td class="res1">1/1</td>
<td class="res2">
<span class="sn_auth_name">Imām</span>,
<span class="sn_auth_firstname">Abū Bakr</span>:
<span class="sn_target_title">Al-Kalām rasmāl</span> [
<span class="sn_target_lang">Arabic</span>]/
<span class="sn_transl_name">Ḥijāzī al-Sayyid</span>,
<span class="sn_transl_firstname">Muṣṭafā</span> /
<span class="sn_pub">
<span class="place">Al-Qāhirah</span>:
<span class="publisher">Al-Majlis al-Alā lil-Thaqāfah</span> [
<span class="sn_country">Egypt</span>]</span>,
<span class="sn_year">2000</span>.
<span class="sn_pagination">588 p.</span>
<span class="sn_orig_title">Magana jarice</span> [
<span class="sn_orig_lang">Afrikaans</span>]
</td></tr>
</table>
!现在,两个结果都塞进了我的CSV文件中的同一个单元格。
有没有办法可以偶尔处理多个结果分配给同一个班级?将第二个(或第三个)结果输入单独的单元格的解决方案?
谢谢!
有人要求提供更多详细信息,所以这里是我通常得到的输出。
<tr>
<td class="res1">7/8</td>
<td class="res2">
<span class="sn_auth_name">Plenge</span>,
<span class="sn_auth_firstname">Vagn</span>;
<span class="sn_auth_name">Wyk</span>,
<span class="sn_auth_firstname">Chris van</span>:
<span class="sn_target_title">Opbrud</span> [
<span class="sn_target_lang">Danish</span>] /
<span class="sn_transl_name">Hansen</span>,
<span class="sn_transl_firstname">Finn Holten</span>;
<span class="sn_transl_name">Madelung</span>,
<span class="sn_transl_firstname">Marianne</span>;
<span class="sn_transl_name">Seiketso</span>,
<span class="sn_transl_firstname">Helen Gaohenngwe</span> /
<span class="sn_pub">
<span class="place">Frederiksberg</span>:
<span class="publisher">AKS</span>,
<span class="place">Frederiksberg</span>:
<span class="publisher">Hjulet</span> [
<span class="sn_country">Denmark</span>]</span>,
<span class="sn_year">2000</span>.
<span class="sn_pagination">247 p.</span> [
<span class="sn_orig_lang">Afrikaans</span>], [
<span class="sn_orig_lang">English</span>]
</td></tr>
这是没有问题的,因为我想要捕获的每一段文本都有一个类类型。但每隔一段时间,我得到一个这样的结果:
sn_auth_name
例如,PlengeWyk
有多个条目。最终在我的CSV文件中的是一个sn_auth_name2
的单元格。理想情况是让脚本创建Plenge
值并将其记录在单独的单元格中,即Wyk
和str_locate
。
有什么想法吗?
答案 0 :(得分:0)
#xpath
方法返回一个NodeSet,它是匹配节点的集合。 NodeSet包括Enumerable,它提供了许多迭代集合的方法。您希望迭代每个节点并收集其文本,而不是获取整个节点集的文本。
sn_auth_name = row.xpath('span[@class="sn_auth_name"]').map { |node| node.text.strip }
#=> ["Plenge", "Wyk"]
作为名称数组,sn_auth_name
仍会在单个单元格中写入CSV。如果您希望将每个名称写入其自己的单元格,则需要展平数组。您可以使用splat展平单个列:
csv << [*sn_auth_name, sn_target_lang]
如果有多个要展平,你也可以展平整个阵列:
csv << [sn_auth_name, sn_target_lang].flatten
执行上述操作意味着每行的列数不同。您可以填充所有行,以便它们具有相同的列数:
# Variable to define which column is the first name column
col_auth_name = 0
# Collect the data from the table into an Array
data = []
doc.css('td.res2').each do |row|
sn_auth_name = row.xpath('span[@class="sn_auth_name"]').map { |node| node.text.strip }
sn_target_lang = row.xpath('span[@class="sn_target_lang"]/text()').text.strip
data << [sn_auth_name, sn_target_lang]
end
# Determine max number of names in a row
max_auth_name = data.map { |row| row[col_auth_name].length }.max
CSV.open("file.csv", "a") do |csv|
data.each do |row|
# Fill the Array of names to meet the max length
row[col_auth_name].fill('', row[col_auth_name].length..(max_auth_name - 1))
# Write to the CSV file
csv << row.flatten
end
end