我有一个数据框,如下所示:
tags categories classification
0 label ['legislative',
'law, govt and
politics', 'exe... None
0 document ['legislative',
'law, govt and politics',
'exe... NaN
0 text ['legislative', 'law,
govt and politics',
'exe... NaN
0 paper ['legislative', 'law,
govt and
politics', 'exe... NaN
0 poster ['legislative', 'law,
govt and politics', 'exe... NaN
我想创建一个新的数据框,在其中我可以将上面的数据框折叠为下面的一个,以便将“标签”和“分类”列的列元素转换为单行,并具有列表格式的单个项,例如>
tags categories classification
0 ['label', ['legislative', ['None','NaN',
'document', 'law, govt and 'NaN','NaN',
'text', politics', 'exe... 'NaN']
'paper',poster']
我该怎么做?如何使用堆栈或按功能分组以获取结果?预先感谢。
*这是df.to_dict()的结果
{'tags': {0: ' letter',
1: ' head',
2: ' water',
3: ' art',
4: ' indoors',
5: ' flyer',
6: ' poster',
...},
'categories': {0: "['legislative', 'law, govt and politics',
'executive branch', 'work', 'society', 'government']",
1: "['unrest and war', 'society', 'religion and spirituality',
'buddhism']",
2: '[]',
3: '[]',
4: "['unemployment', 'society', 'law, govt and politics',
'foreign policy', 'work', 'politics', 'armed forces']",
5: '[]',
6: "['sports', 'law, govt and politics', 'wrestling']",
...},
'classfication': {0: nan,
1: nan,
2: nan,
3: nan,
4: nan,
5: nan,
6: nan,
...}}
答案 0 :(得分:0)
我没有完全回答您的问题。但是你想要这样的东西吗?
df:
trial_num subject samples
0 1 1 [-1.74, -0.78, -0.11]
1 2 1 [0.86, 0.21, -0.01]
2 3 1 [2.04, 0.6, -0.79]
3 1 2 [0.52, 0.49, 1.56]
4 2 2 [0.07, 0.84, -1.1]
5 3 2 [0.43, -1.3, 1.99]
转换后的df:
trial_num subject samples
0 [1, 2, 3, 1, 2, 3] [1, 1, 1, 2, 2, 2] [[-1.74, -0.78, -0.11], [0.86, 0.21, -0.0...trial_num subject samples
0 [1, 2, 3, 1, 2, 3] [1, 1, 1, 2, 2, 2] [[-1.74, -0.78, -0.11], [0.86, 0.21, -0.0...
import pandas as pd
df = pd.DataFrame(
{'trial_num': [1, 2, 3, 1, 2, 3],
'subject': [1, 1, 1, 2, 2, 2],
'samples': [list(np.random.randn(3).round(2)) for i in range(6)]
}
)
df = df.astype(str).apply(', '.join).apply(lambda x: x.split(',')).to_frame().T