我有以下系列的列表。
[LVH = 0 63 (88.73 %)
LVH = 1 6 (8.45 %)
LVH = 2 1 (1.41 %)
LVH = 3 1 (1.41 %)
dtype: object, LV diastolic dysfunction (guideline) = 0 60 (84.51 %)
LV diastolic dysfunction (guideline) = 1 8 (11.27 %)
LV diastolic dysfunction (guideline) = 4 3 (4.23 %)
dtype: object, LV diastolic dysfunction grade (formula) = 0.0 60 (84.51 %)
LV diastolic dysfunction grade (formula) = 1.0 4 (5.63 %)
LV diastolic dysfunction grade (formula) = 3.0 4 (5.63 %)
LV diastolic dysfunction grade (formula) = 4.0 3 (4.23 %)
dtype: object, LV filling pressure(formula) = 0 67 (94.37 %)
LV filling pressure(formula) = 1 4 (5.63 %)
dtype: object, cause of hospitalization = 8 2 (2.82 %)
cause of hospitalization = 1 43 (60.56 %)
cause of hospitalization = 2 21 (29.58 %)
cause of hospitalization = 3 1 (1.41 %)
cause of hospitalization = 6 4 (5.63 %)
dtype: object, simplfied cause of hospitalization = 1 43 (60.56 %)
simplfied cause of hospitalization = 2 22 (30.99 %)
simplfied cause of hospitalization = 3 4 (5.63 %)
simplfied cause of hospitalization = 5 2 (2.82 %)
dtype: object, ACC/AHA = A 10 (14.08 %)
ACC/AHA = 0 56 (78.87 %)
ACC/AHA = C 2 (2.82 %)
ACC/AHA = B 3 (4.23 %)
dtype: object, ACC-AHA -binary = 0 69 (97.18 %)
ACC-AHA -binary = 1 2 (2.82 %)
dtype: object, NYHA = I 65 (91.55 %)
NYHA = II 2 (2.82 %)
NYHA = III 4 (5.63 %)
dtype: object, NYHA-binary = 0 66 (92.96 %)
NYHA-binary = 1 5 (7.04 %)
dtype: object]
对于列表的每个元素(即系列),我需要将它们转换为具有两列的数据框。例如,它应如下所示:
Column 1 Column 2
LVH = 0 63 (88.73 %)
LVH = 1 6 (8.45 %)
LVH = 2 1 (1.41 %)
LVH = 3 1 (1.41 %)
LV diastolic dysfunction (guideline) = 0 60 (84.51 %)
LV diastolic dysfunction (guideline) = 1 8 (84.51 %)
LV diastolic dysfunction (guideline) = 4 3 (84.51 %)
...
以此类推。然后,它将转换为CSV格式供人们下载。我只使用了基本的pd.DataFrame
和pd.DataFrame.from_items
。第一个将其转换为数据帧,但不是我想要的方式。第二个给出了错误,但是我认为这不会有所帮助。我该如何解决?
更新
categorical_vars_multi_class = ['LVH','LV diastolic dysfunction (guideline)','LV diastolic dysfunction grade (formula)','LV filling pressure(formula)','cause of hospitalization','simplfied cause of hospitalization','ACC/AHA','ACC-AHA -binary','NYHA','NYHA-binary']
def getMultiClassData(index,table, prop):
tab = pd.Series()
for i in range(len(table)):
tab_str = str(table[i]) + " (" + str(prop[i]) + " %)"
tab = tab.set_value(i,tab_str)
tab.index = index
return(tab)
def getMultiClassTable(data,name):
table = pd.value_counts(data[name].values, sort=False)
table.index = [name + ' = ' + str(x) for x in table.index]
prop = (table/table.sum() * 100).round(2)
return(getMultiClassData(table.index,table.values, prop))
m_cluster_1 = [getMultiClassTable(data,x) for x in categorical_vars_multi_class]
data
是一个数据框,其中包含列名和变量的度量。数据集庞大且敏感。
答案 0 :(得分:1)
由于未知data
,我无法复制您的示例,因此我形成了自己的示例示例。您可以从中获得帮助-
s1 = pd.Series(['1kg', '2kg'], index=['first', 'second'])
s2 = pd.Series(['3kg', '4kg'], index=['third', 'fourth'])
lst = [s1, s2]
lst
# [first 1kg
# second 2kg
# dtype: object, third 3kg
# fourth 4kg
# dtype: object]
ndf = pd.concat(lst, axis = 1, keys=[s.name for s in lst], sort=False).fillna('').apply(lambda x: ''.join(x), axis=1)
ndf = pd.DataFrame(ndf).reset_index()
ndf.columns = ['Column 1', 'Column 2']
ndf
+---+----------+----------+
| | Column 1 | Column 2 |
+---+----------+----------+
| 0 | first | 1kg |
| 1 | second | 2kg |
| 2 | third | 3kg |
| 3 | fourth | 4kg |
+---+----------+----------+