Question

List = ['Stevie', '$101', '(33%)', 'Hezazeb', '$3.60', '(85%)', 
        'Boga Dreams', '$5.50', '(25%)', 'Grey Detective', '$8.00', '(22%)', 
        'Bring A Dame', '$26.00', '(8%)', 'Sandhill', '$5.00', '(100%)', 
        'Burgundy Rules', '$41.00', '(17%).', 'Luxitorah', '$7.50', '(0%)', 
        'Play On Words', '$21.00', '(14%).', 'Cranky Sheriff', '$13.00', '(8%)']

我希望此列表以以下方式存储在具有3列的数据框中。我正在从网站上获取此数据，因此无法手动进行操作。

- Playername Bids probability   
- Stevie.    $101.   33% 
- Hezazeb.   $3.60.  85%
- .
- .
- .

以此类推

Answer 1

这甚至适用于非固定数据-您只需要预处理：

# jumbled data needs preprocessing - if you have cleaner data skip that step
data = ['Stevie', '$101', '(33%)', 'Hezazeb', '$3.60', '(85%)', 'Boga', 
'Dreams', '$5.50', '(25%)', 'Grey', 'Detective', '$8.00', '(22%)', 'Bring', 
'A', 'Dame', '$26.00', '(8%)', 'Sandhill', '$5.00', '(100%)', 'Burgundy', 
'Rules', '$41.00', '(17%).', 'Luxitorah', '$7.50', '(0%)', 'Play', 'On', 
'Words', '$21.00', '(14%).', 'Cranky', 'Sheriff', '$13.00', '(8%)']

# Preprocess: 
# join all into one string
# fix irritating ). to be )
# fix $ to be |$ 
# fix ) to be )|  to enable splitting at | for well formed data
d = " ".join(data).replace(").", ")").replace(")", 
    ")|").replace("$", "|$").replace("(", "|(")

# remove pending last |
# split at | into well formatted list
d = d.rstrip("|").split("|")

import pandas as pd

# use list slicing to fill dataframe from source list
df = pd.DataFrame({"Name": d[0::3], "Bet": d[1::3], "probability": d[2::3]})

print(df)

输出：

               Name      Bet probability
0           Stevie     $101        (33%)
1          Hezazeb    $3.60        (85%)
2      Boga Dreams    $5.50        (25%)
3   Grey Detective    $8.00        (22%)
4     Bring A Dame   $26.00         (8%)
5         Sandhill    $5.00       (100%)
6   Burgundy Rules   $41.00        (17%)
7        Luxitorah    $7.50         (0%)
8    Play On Words   $21.00        (14%)
9   Cranky Sheriff   $13.00         (8%)

如果您不清楚，请参见Understanding slice notation。

Answer 2

要将列表转换为数据框，您应该具有元组或pd.series或任何数据框对象，在这种情况下，您应该将它们转换为dict，我建议手动进行，因为每个点的数据都非常不同，然后填充nan缺少的列。

如何将单个列表转换为具有多列的数据框？

2 个答案: