假设我有一个Pandas DataFrame,如下所示:
var originalNodes = new List<TreeNode>(); // TreeNodeCollection
var nodes = new List<TreeNode>(); // TreeNodeCollection
var parentByName = nodes.ToDictionary(n => n.Text, n => n.Parent);
foreach(var originalNode in originalNodes)
{
TreeNode parent;
if (!parentByName.TryGetValue(originalNode.Text, out parent))
{
// removed - there is no key for original node name
continue;
}
if (originalNode.Parent?.Text != parent?.Text)
{
// moved from originalNode.Parent to parent
continue;
}
}
// these guys are added
var added = parentByName.Keys.Except(originalNodes.Select(n => n.Text))
我想将其转换为:
category sentences
Data1 String1
NaN String2
NaN String3
Data2 String1
NaN String4
Data2 String1
NaN String6
NaN String7
Data3 String1
NaN String8
NaN String9
从标题中可以看出,右列是完整对话的句子,左栏是各自的类别。我在这里尝试做的只是选择带有category sentences
Data1 String1 String2 String3
Data2 String1 String4
Data2 String1 String6 String7
Data3 String1 String8 String9
值的行,然后将它们加到前面的行中,直到达到NaN
。
到目前为止,对我来说这是一次失败,因为我尝试了不同的东西而仍然没有解决方案。我怎么能这样做?
另一个问题:我选择了我的DataFrame(让我们调用String1
)并选择前3行并使用返回df
的{{1}}对它们求和。如果我在末尾添加df[0:3].sum()
,我得到的每一行都是零。我尝试Series([], dtype: float64)
并返回.sum(axis=1)
。我也尝试添加.sum(axis=0)
,但结果相同。那么,任何人都可以说出我做错了什么以及我应该做些什么?
TL; DR:我想将Series([], dtype: float64)
到iloc
之间的字符串相加,而不包括最后一个String1
。有可能这样做,如果是这样,怎么样?
只是一点注意:抱歉格式化。我仍然无法适应它......
答案 0 :(得分:1)
非最佳,非pythonic和丑陋!但它完成了这项工作:
import pandas as pd
old_table = pd.read_csv('your_table.csv')
new_table = pd.DataFrame([],columns=('category','sentences'))
for ID,row in old_table.iterrows():
if not pd.isnull(row['category']):
new_table.loc[len(new_table)] = [row['category'],[row['sentences']]]
else:
string = list(new_table.loc[len(new_table)-1]['sentences'])
string.append(row['sentences'])
new_table.loc[len(new_table)-1]['sentences'] = string
print(old_table,'\n====\n',new_table)
它给出了:
category sentences
0 One hello
1 NaN my
2 NaN little
3 NaN friend
4 Two hello
5 NaN to
6 NaN you
7 NaN too
====
category sentences
0 One [hello, my, little, friend]
1 Two [hello, to, you, too]
答案 1 :(得分:1)
创建一个临时ID列,用作组键和类别列,然后连接每个组的句子。
df=df.copy()
df['ID'] = df.index.to_series()[df.category.notnull()]
df.fillna(method='ffill')\
.groupby(['ID','category'])['sentences']\
.apply(lambda x: ' '.join(x))\
.reset_index()\
.drop('ID',1)
Out[59]:
category sentences
0 Data1 String1 String2 String3
1 Data2 String1 String4
2 Data2 String1 String6 String7
3 Data3 String1 String8 String9
答案 2 :(得分:0)