Question

假设我有一个Pandas DataFrame，如下所示：

var originalNodes = new List<TreeNode>(); // TreeNodeCollection 
var nodes = new List<TreeNode>();         // TreeNodeCollection 
var parentByName = nodes.ToDictionary(n => n.Text, n => n.Parent);

foreach(var originalNode in originalNodes)
{
    TreeNode parent;
    if (!parentByName.TryGetValue(originalNode.Text, out parent))
    {
        // removed - there is no key for original node name
        continue;
    }

    if (originalNode.Parent?.Text != parent?.Text)
    {
        // moved from originalNode.Parent to parent
        continue;
    }
}

// these guys are added
var added = parentByName.Keys.Except(originalNodes.Select(n => n.Text))

我想将其转换为：

category  sentences
Data1     String1
NaN       String2
NaN       String3
Data2     String1
NaN       String4
Data2     String1
NaN       String6
NaN       String7
Data3     String1
NaN       String8
NaN       String9

从标题中可以看出，右列是完整对话的句子，左栏是各自的类别。我在这里尝试做的只是选择带有category sentences Data1 String1 String2 String3 Data2 String1 String4 Data2 String1 String6 String7 Data3 String1 String8 String9值的行，然后将它们加到前面的行中，直到达到NaN。

到目前为止，对我来说这是一次失败，因为我尝试了不同的东西而仍然没有解决方案。我怎么能这样做？

另一个问题：我选择了我的DataFrame（让我们调用String1）并选择前3行并使用返回df的{{1}}对它们求和。如果我在末尾添加df[0:3].sum()，我得到的每一行都是零。我尝试Series([], dtype: float64)并返回.sum(axis=1)。我也尝试添加.sum(axis=0)，但结果相同。那么，任何人都可以说出我做错了什么以及我应该做些什么？

TL; DR：我想将Series([], dtype: float64)到iloc之间的字符串相加，而不包括最后一个String1。有可能这样做，如果是这样，怎么样？

只是一点注意：抱歉格式化。我仍然无法适应它......

Answer 1

非最佳，非pythonic和丑陋！但它完成了这项工作：

import pandas as pd

old_table = pd.read_csv('your_table.csv')
new_table = pd.DataFrame([],columns=('category','sentences'))

for ID,row in old_table.iterrows():
    if not pd.isnull(row['category']):
        new_table.loc[len(new_table)] = [row['category'],[row['sentences']]]
    else:
        string = list(new_table.loc[len(new_table)-1]['sentences'])
        string.append(row['sentences'])
        new_table.loc[len(new_table)-1]['sentences'] = string

print(old_table,'\n====\n',new_table)

它给出了：

  category sentences
0      One     hello
1      NaN        my
2      NaN    little
3      NaN    friend
4      Two     hello
5      NaN        to
6      NaN       you
7      NaN       too 
====
   category                    sentences
0      One  [hello, my, little, friend]
1      Two        [hello, to, you, too]

Answer 2

创建一个临时ID列，用作组键和类别列，然后连接每个组的句子。

df=df.copy()
df['ID'] = df.index.to_series()[df.category.notnull()]
df.fillna(method='ffill')\
  .groupby(['ID','category'])['sentences']\
  .apply(lambda x: ' '.join(x))\
  .reset_index()\
  .drop('ID',1)
Out[59]: 
  category                sentences
0    Data1  String1 String2 String3
1    Data2          String1 String4
2    Data2  String1 String6 String7
3    Data3  String1 String8 String9

Answer 3

使用来自Series的{{1}}（fillna with method ='ffill'）的唯一值创建arange，其中notnull值为{groupby 1}}：

ffill

然后category s = df['category'].where(df['category'].isnull(), np.arange(len(df.index))).ffill() 0 0 1 0 2 0 3 3 4 3 5 5 6 5 7 5 8 8 9 8 10 8 Name: category, dtype: int64和agg：

熊猫：如何对特定行进行求和

3 个答案: