如何将系列列表转换为两列数据框?

时间:2018-12-14 22:03:24

标签: pandas list dataframe series

我有以下系列的列表。

[LVH = 0    63 (88.73 %)
 LVH = 1      6 (8.45 %)
 LVH = 2      1 (1.41 %)
 LVH = 3      1 (1.41 %)
 dtype: object, LV diastolic dysfunction (guideline) = 0    60 (84.51 %)
 LV diastolic dysfunction (guideline) = 1     8 (11.27 %)
 LV diastolic dysfunction (guideline) = 4      3 (4.23 %)
 dtype: object, LV diastolic dysfunction grade (formula) = 0.0    60 (84.51 %)
 LV diastolic dysfunction grade (formula) = 1.0      4 (5.63 %)
 LV diastolic dysfunction grade (formula) = 3.0      4 (5.63 %)
 LV diastolic dysfunction grade (formula) = 4.0      3 (4.23 %)
 dtype: object, LV filling pressure(formula) = 0    67 (94.37 %)
 LV filling pressure(formula) = 1      4 (5.63 %)
 dtype: object, cause of hospitalization = 8      2 (2.82 %)
 cause of hospitalization = 1    43 (60.56 %)
 cause of hospitalization = 2    21 (29.58 %)
 cause of hospitalization = 3      1 (1.41 %)
 cause of hospitalization = 6      4 (5.63 %)
 dtype: object, simplfied cause of hospitalization = 1    43 (60.56 %)
 simplfied cause of hospitalization = 2    22 (30.99 %)
 simplfied cause of hospitalization = 3      4 (5.63 %)
 simplfied cause of hospitalization = 5      2 (2.82 %)
 dtype: object, ACC/AHA = A    10 (14.08 %)
 ACC/AHA = 0    56 (78.87 %)
 ACC/AHA = C      2 (2.82 %)
 ACC/AHA = B      3 (4.23 %)
 dtype: object, ACC-AHA -binary = 0    69 (97.18 %)
 ACC-AHA -binary = 1      2 (2.82 %)
 dtype: object, NYHA = I      65 (91.55 %)
 NYHA = II       2 (2.82 %)
 NYHA = III      4 (5.63 %)
 dtype: object, NYHA-binary = 0    66 (92.96 %)
 NYHA-binary = 1      5 (7.04 %)
 dtype: object]

对于列表的每个元素(即系列),我需要将它们转换为具有两列的数据框。例如,它应如下所示:

Column 1                                      Column 2
LVH = 0                                       63 (88.73 %)
LVH = 1                                        6 (8.45 %)
LVH = 2                                        1 (1.41 %)    
LVH = 3                                        1 (1.41 %)
LV diastolic dysfunction (guideline) = 0      60 (84.51 %)
LV diastolic dysfunction (guideline) = 1       8 (84.51 %)
LV diastolic dysfunction (guideline) = 4       3 (84.51 %)
... 

以此类推。然后,它将转换为CSV格式供人们下载。我只使用了基本的pd.DataFramepd.DataFrame.from_items。第一个将其转换为数据帧,但不是我想要的方式。第二个给出了错误,但是我认为这不会有所帮助。我该如何解决?

更新

categorical_vars_multi_class = ['LVH','LV diastolic dysfunction (guideline)','LV diastolic dysfunction grade (formula)','LV filling pressure(formula)','cause of hospitalization','simplfied cause of hospitalization','ACC/AHA','ACC-AHA -binary','NYHA','NYHA-binary']

def getMultiClassData(index,table, prop):
    tab = pd.Series()
    for i in range(len(table)): 
        tab_str = str(table[i]) + " (" + str(prop[i]) + " %)"
        tab = tab.set_value(i,tab_str)
    tab.index = index
    return(tab)


def getMultiClassTable(data,name):
    table = pd.value_counts(data[name].values, sort=False)
    table.index = [name + ' = ' + str(x) for x in table.index]
    prop = (table/table.sum() * 100).round(2)

    return(getMultiClassData(table.index,table.values, prop))



m_cluster_1 = [getMultiClassTable(data,x) for x in categorical_vars_multi_class]

data是一个数据框,其中包含列名和变量的度量。数据集庞大且敏感。

1 个答案:

答案 0 :(得分:1)

由于未知data,我无法复制您的示例,因此我形成了自己的示例示例。您可以从中获得帮助-

s1 = pd.Series(['1kg', '2kg'], index=['first', 'second'])
s2 = pd.Series(['3kg', '4kg'], index=['third', 'fourth'])
lst = [s1, s2]
lst

# [first     1kg
#  second    2kg
#  dtype: object, third     3kg
#  fourth    4kg
#  dtype: object]

ndf = pd.concat(lst,  axis = 1, keys=[s.name for s in lst], sort=False).fillna('').apply(lambda x: ''.join(x), axis=1)
ndf = pd.DataFrame(ndf).reset_index()
ndf.columns = ['Column 1', 'Column 2']
ndf


+---+----------+----------+
|   | Column 1 | Column 2 |
+---+----------+----------+
| 0 | first    | 1kg      |
| 1 | second   | 2kg      |
| 2 | third    | 3kg      |
| 3 | fourth   | 4kg      |
+---+----------+----------+