Question

根据查询，我的DF可以有一个包含字符串的列或一个包含NaN的列。

例如：

  ID     grams   Projects
0  891            4.0      NaN
1  725            9.0      NaN

或

  ID     grams   Projects
0  890            1.0      P1, P2
1  724            1.0      P1
2  880            1.0      P1, P2
3  943            1.0      P1
4  071            1.0      P1

我可以处理其中一个，但是当我尝试制作一个通用的功能时，我会失败。我需要在最后忽略NaN，因为我将此DF作为JSON响应发送，而NaN给我的格式无效。

我现在正在做的事情是：

#When Projects is a string
df['Projects'] = _df.groupby("ID")['External_Id'].apply(lambda x: ",".join(x))

#When Projects is NaN
df['Projects'] = _df.groupby("ID")['External_Id'].apply(lambda x: "")

我尝试使用fillna()并检查了＆＃39; x＆＃39;但它总是以对象的形式返回，因此我无法检查它是 str 还是 NaN

此外，＆＃39;项目的结果＆＃39;列不应该允许重复。按ID分组的某些行包含将要求和的重要信息（＆＃39;克＆＃39;），但＆＃39; External_Id＆＃39;不应该出现不止一次。例如：

  ID       grams      External_Id
0  890        1.0      P1
1  890        1.0      P2
2  890        1.0      P2
3  724        1.0      P1
4  724        1.0      P1

结果应为

  ID       grams      Projects
0  890        3.0      P1, P2
1  724        2.0      P1

而不是

  ID       grams      Projects
0  890        1.0      P1, P2, P2
1  724        1.0      P1, P1

Answer 1

假设您从

开始

In [37]: df = pd.DataFrame({'a': [1, 1, 2, 2], 'b': [1, None, 2, 4], 'c': ['foo', 'sho', 'sha', 'bar']})

In [43]: df
Out[43]: 
   a    b    c
0  1  1.0  foo
1  1  NaN  foo
2  2  2.0  sha
3  2  4.0  bar

然后你可以将相同的功能应用于b或c，照顾NaN和重复：

In [44]: df.b.groupby(df.a).apply(lambda x: '' if x.isnull().any() else ','.join(set(x.astype(str).values)))
Out[44]: 
a
1           
2    2.0,4.0
dtype: object

In [45]: df.c.groupby(df.a).apply(lambda x: '' if x.isnull().any() else ','.join(set(x.astype(str).values)))
Out[45]: 
a
1        foo
2    sha,bar
dtype: object

Answer 2

我认为这应该有所帮助：

import numpy
df_new = df.replace(numpy.nan,' ', regex=True)

编辑：

我认为这solution可能对你有用（只是作为@Ami答案的替代。

Pandas - 在列中的groupby之后的Concat字符串，忽略NaN，忽略重复

2 个答案: