如何从嵌套列表中将数据提取到CSV或表中?

时间:2015-01-07 05:45:44

标签: python csv

我目前正在制作神奇宝贝数据库应用程序,以防止手动输入大约50,000个神奇宝贝<>移动链接,我希望自动化这个过程。 我在网上找到了一个免费提供的数据集,其中的神奇宝贝<>移动链接存在,但是采用嵌套列表格式。

我在这里复制并粘贴了部分数据集:http://pastebin.com/ADeRaBiu

最后,我希望有一个表(理想情况下以CSV / Excel可读格式存储),如下所示:

| pokemonname | move    | movelearnmethod |
|-------------|---------|-----------------|
| bulbasaur   | amnesia | 6E              |
| bulbasaur   | attract | 6M              |
| bulbasaur   | bind    | 6T              |
| bulbasaur   | endure  | 6E              |
| bulbasaur   | endure  | 6T              |

我尝试在Python中使用split()命令开始按分隔符拆分,但是有多个不同的分隔符,我不知道如何解决这个问题。 任何帮助将不胜感激!谢谢!

更新

只是为了澄清,我想确保如果神奇宝贝有一个移动的多个移动方法,例如bulbasaur的忍受 - 它具有“6E”和“6T”的移动方法 - 它为第二个移动方法创建了一个单独的行,如上表所示。

2 个答案:

答案 0 :(得分:1)

示例数据非常类似于Python字典,但未引用键。您可以使用一些正则表达式修复它,然后将其作为Python字典引用,其中解析非常简单。

import re
import ast
data = """{bulbasaur:{learnset:{amnesia:["6E"],attract:["6M"],bind:["6T"],block:[],bodyslam:[],bulletseed:[],captivate:[],charm:["6E"],confide:["6M"],curse:["6E"],cut:["6M"],defensecurl:[],doubleedge:["6L027"],doubleteam:["6M"],echoedvoice:["6M"],endure:["6E","6T"],energyball:["6M"],facade:["6M"],falseswipe:[],flash:["6M"],frenzyplant:[],frustration:["6M"],furycutter:[],gigadrain:["6E","6T"],grassknot:["6M"],grasspledge:["6T"],grasswhistle:["6E"],grassyterrain:["6E"],growl:["6L003"],growth:["6L025"],headbutt:[],hiddenpower:["6M"],ingrain:["6E"],knockoff:["6T"],leafstorm:["6E"],leechseed:["6L007"],lightscreen:["6M"],magicalleaf:["6E"],mimic:[],mudslap:[],naturalgift:[],naturepower:["6E","6M"],petaldance:["6E"],poisonpowder:["6L013"],powerwhip:["6E"],protect:["6M"],razorleaf:["6L019"],rest:["6M"],"return":["6M"],rocksmash:["6M"],round:["6M"],safeguard:["6M"],secretpower:["6M"],seedbomb:["6L037","6T"],skullbash:["6E"],sleeppowder:["6L013"],sleeptalk:["6M"],sludge:["6E"],sludgebomb:["6M"],snore:["6T"],solarbeam:["6M"],strength:["6M"],stringshot:[],substitute:["6M"],sunnyday:["6M"],swagger:["6M"],sweetscent:["6L021"],swordsdance:["6M"],synthesis:["6L033","6T"],tackle:["6L001a"],takedown:["6L015"],toxic:["6M"],venoshock:["6M"],vinewhip:["6L009"],weatherball:[],worryseed:["6L031","6T"]}}}"""
dict_data = re.sub('(\w+):', '"\\1":', data)
move_data = ast.literal_eval(dict_data)
for pokemonname in move_data.keys():
    learn_set = move_data[pokemonname]['learnset']
    for move in learn_set.keys():
        for method in learn_set[move]:
            print 'pokemonname: {0}, move: {1}, movelearnmethod: {2}'.format(pokemonname, move, method)


pokemonname: bulbasaur, move: sludgebomb, movelearnmethod: 6M
pokemonname: bulbasaur, move: venoshock, movelearnmethod: 6M
pokemonname: bulbasaur, move: doubleteam, movelearnmethod: 6M
pokemonname: bulbasaur, move: confide, movelearnmethod: 6M
pokemonname: bulbasaur, move: rest, movelearnmethod: 6M
pokemonname: bulbasaur, move: sludge, movelearnmethod: 6E
pokemonname: bulbasaur, move: growth, movelearnmethod: 6L025
pokemonname: bulbasaur, move: grassknot, movelearnmethod: 6M
pokemonname: bulbasaur, move: facade, movelearnmethod: 6M
pokemonname: bulbasaur, move: return, movelearnmethod: 6M
pokemonname: bulbasaur, move: attract, movelearnmethod: 6M
pokemonname: bulbasaur, move: echoedvoice, movelearnmethod: 6M
pokemonname: bulbasaur, move: substitute, movelearnmethod: 6M
pokemonname: bulbasaur, move: growl, movelearnmethod: 6L003
pokemonname: bulbasaur, move: curse, movelearnmethod: 6E
pokemonname: bulbasaur, move: powerwhip, movelearnmethod: 6E
pokemonname: bulbasaur, move: ingrain, movelearnmethod: 6E
pokemonname: bulbasaur, move: gigadrain, movelearnmethod: 6E
pokemonname: bulbasaur, move: gigadrain, movelearnmethod: 6T
pokemonname: bulbasaur, move: worryseed, movelearnmethod: 6L031
pokemonname: bulbasaur, move: worryseed, movelearnmethod: 6T
pokemonname: bulbasaur, move: flash, movelearnmethod: 6M
pokemonname: bulbasaur, move: takedown, movelearnmethod: 6L015
...

获得此数据后,我建议您查看Python的CSV编写器:https://docs.python.org/2/library/csv.html#writer-objects。在您创建了writer对象之后,可以通过调用writerow来替换上面的print。

答案 1 :(得分:-1)

我看不出“多个分隔符”的含义。好吧,逗号用在很多地方,但冒号或右括号可能是好的分隔符。

另一种方法是使用正则表达式,因此,使用perl而不是python。

人士, 亚历克西斯。