Question

import pandas as pd
df = pd.DataFrame({
    'id':[1,2,3,4,5,6,7,8,9,10,11],
    'text': ['abc','zxc','qwe','asf','efe','ert','poi','wer','eer','poy','wqr']})

我有一个包含列的DataFrame：

id    text
1      abc
2      zxc
3      qwe
4      asf
5      efe
6      ert
7      poi
8      wer
9      eer
10     poy
11     wqr

我有一个列表L = [1,3,6,10]，其中包含ID列表。

我正在尝试使用列表附加文本列，从我的列表中首先获取1和3（列表中的前两个值）并在我的DataFrame中添加文本列，其中id = 1具有id＆＃ 39; s 2，然后删除具有id列2的行，然后取3和6，然后将id = 4,5添加到id 3的文本列，然后删除id = 4和5的行，并迭代列表中的元素（x ，x + 1）

我的最终输出如下：

id   text
1    abczxc         # joining id 1 and 2
3    qweasfefe      # joining id 3,4 and 5
6    ertpoiwereer   # joining id 6,7,8,9
10   poywqr         # joining id 10 and 11

Answer 1

您可以将isin与cumsum用于系列，该groupby用于apply join功能：

s = df.id.where(df.id.isin(L)).ffill().astype(int)
df1 = df.groupby(s)['text'].apply(''.join).reset_index()
print (df1)
   id          text
0   1        abczxc
1   3     qweasfefe
2   6  ertpoiwereer
3  10        poywqr

它的工作原因是：

s = df.id.where(df.id.isin(L)).ffill().astype(int)
print (s)
0      1
1      1
2      3
3      3
4      3
5      6
6      6
7      6
8      6
9     10
10    10
Name: id, dtype: int32

Answer 2

我将列表中的值更改为np.nan，然后将ffill和groupby更改为。虽然@ Jezrael的方法要好得多。我需要记住使用cumsum：）

l = [1,3,6,10]
df.id[~df.id.isin(l)] = np.nan
df = df.ffill().groupby('id').sum()

        text
id  
1.0     abczxc
3.0     qweasfefe
6.0     ertpoiwereer
10.0    poywqr

Answer 3

使用pd.cut创建二进制文件，然后使用lambda函数创建groupby以加入该组中的文本。

df.groupby(pd.cut(df.id,L+[np.inf],right=False, labels=[i for i in L])).apply(lambda x: ''.join(x.text))

编辑：

(df.groupby(pd.cut(df.id,L+[np.inf],
              right=False, 
              labels=[i for i in L]))
  .apply(lambda x: ''.join(x.text)).reset_index().rename(columns={0:'text'}))

输出：

   id          text
0   1        abczxc
1   3     qweasfefe
2   6  ertpoiwereer
3  10        poywqr

Python pandas：追加DataFrame的行并删除附加的行

3 个答案: