我编写了一个脚本来从表中获取数据并将其写入csv
文件中。所需的数据即将到来,我的脚本也可以将它们写入csv文件中。但是,我无法解决的唯一问题是将所有数据都放在不同的列中。我希望将name
和link
放在不同的列中,但它们会位于同一列中。如何解决?任何帮助将受到高度赞赏。
我尝试的脚本:
import csv
from bs4 import BeautifulSoup
content="""
<tr>
<td align="center">1964</td>
<td><span class="sortkey">Townes, Charles Hard</span><span class="vcard"><span class="fn"><a href="/wiki/Charles_Hard_Townes" class="mw-redirect" title="Charles Hard Townes">Charles Hard Townes</a></span></span>;<br>
<span class="sortkey">Basov, Nikolay</span><span class="vcard"><span class="fn"><a href="/wiki/Nikolay_Basov" title="Nikolay Basov">Nikolay Basov</a></span></span>;<br>
<span class="sortkey">Prokhorov, Alexander</span><span class="vcard"><span class="fn"><a href="/wiki/Alexander_Prokhorov" title="Alexander Prokhorov">Alexander Prokhorov</a></span></span></td>
<td><span class="sortkey">Hodgkin, Dorothy</span><span class="vcard"><span class="fn"><a href="/wiki/Dorothy_Hodgkin" title="Dorothy Hodgkin">Dorothy Hodgkin</a></span></span></td>
<td><span class="sortkey">Bloch, Konrad Emil</span><span class="vcard"><span class="fn"><a href="/wiki/Konrad_Emil_Bloch" title="Konrad Emil Bloch">Konrad Emil Bloch</a></span></span>;<br>
<span class="sortkey">Lynen, Feodor Felix Konrad</span><span class="vcard"><span class="fn"><a href="/wiki/Feodor_Felix_Konrad_Lynen" class="mw-redirect" title="Feodor Felix Konrad Lynen">Feodor Felix Konrad Lynen</a></span></span></td>
<td><span class="sortkey">Sartre, Jean-Paul</span><span class="vcard"><span class="fn"><a href="/wiki/Jean-Paul_Sartre" title="Jean-Paul Sartre">Jean-Paul Sartre</a></span></span><sup class="reference" id="ref_Note1D"><a href="#endnote_Note1D">[D]</a></sup></td>
<td><span class="sortkey">King, Jr., Martin Luther</span><span class="vcard"><span class="fn"><a href="/wiki/Martin_Luther_King,_Jr." class="mw-redirect" title="Martin Luther King, Jr.">Martin Luther King, Jr.</a></span></span></td>
<td align="center">—</td>
</tr>
"""
soup = BeautifulSoup(content,"lxml")
for items in soup.select('tr'):
item_name = [' '.join([item.text,item.get('href')]) for item in items.select(".fn a")]
print(item_name)
with open("tab_data.csv","a",newline="") as infile:
writer = csv.writer(infile)
writer.writerow(item_name)
输出我得到(同一列中的名称和链接): 我希望输出(名称和链接在单独的列中):
顺便说一下,这是该主题的后续问题:Thread_Link
答案 0 :(得分:2)
如果您需要不同列中的文字和网址,那么您就不必加入它们:
import itertools
...
for items in soup.select('tr'):
list_of_tuples = [(item.text,item.get('href')) for item in items.select(".fn a")]
item_name = list(itertools.chain(*list_of_tuples))
print(item_name)
with open("tab_data.csv","a",newline="") as infile:
writer = csv.writer(infile)
writer.writerow(item_name)
编辑 OP询问*list_of_tuples
首先,我们需要了解itertools.chain( x, y )
的含义。这是为了链接&#39;两个列表(枚举):
>>> import itertools
>>> x=[1,2,3]
>>> y=(4,5,6)
>>> itertools.chain( x, y )
<itertools.chain object at 0x7f5811df8690>
>>> list(itertools.chain( x, y ))
[1, 2, 3, 4, 5, 6]
现在,我们已经准备好了解unpacking arguments。让我们假设我们将x和y参数(来自示例)放入一个列表中:l = [x,y]。在这种情况下,我们可以使用*
运算符解压缩此列表:
>>> l=[x,y]
>>> list(itertools.chain( *l ))
[1, 2, 3, 4, 5, 6]
在你的情况下,你想要链接很多元组:
>>> t1=(1,2)
>>> t2=(3,4)
>>> t3=(4,5)
>>> list(itertools.chain( t1, t2, t3 ))
[1, 2, 3, 4, 4, 5]
但是你在要解压缩的列表中有这个元组:
>>> l=[t1, t2, t3]
>>> list(itertools.chain( *l ))
[1, 2, 3, 4, 4, 5]
我希望这对你有意义。