Question

我的数据框看起来是：

df = pd.DataFrame([[100, ' tes  t  ', 3], [100, np.nan, 2], [101, ' test1', 3 ], [101,'   ', 4]])

看起来像是

         0      1      2
     0  100    tes t   3
     1  100    NaN     2
     2  101   test1    3
     3  101            4

我想要填写第1列和第34列;转发＆＃34;用test和test1。我相信一种方法是使用np.nan替换空格，但由于这些单词也包含空格，所以很难。我也可以从第0列分组，然后使用每个组的第一个元素来填充。你能为我提供一些替代方案的代码吗？我没有把它编码？

此外，我想添加一个包含该组意味着的列最终的数据框应该如下所示

         0      1      2  3
     0  100   tes t    3  2.5
     1  100   tes t    2  2.5
     2  101   test1    3  3.5
     3  101   test1    4  3.5

你能否请你建议如何完成这样的事情？

非常感谢，如果您需要进一步的信息，请告诉我。

Answer 1

IIUC，你可以使用str.strip，然后检查被剥离的字符串是否为空。然后，执行groupby操作并按方法Nans填充ffill并使用groupby.transform函数计算均值，如下所示：

df[1] = df[1].str.strip().dropna().apply(lambda x: np.NaN if len(x) == 0 else x)

df[1] = df.groupby(0)[1].fillna(method='ffill')
df[3] = df.groupby(0)[2].transform(lambda x: x.mean())
df

注意：如果您必须使用该组的第一个元素转发填充NaN值，则必须执行此操作：

df.groupby(0)[1].apply(lambda x: x.fillna(x.iloc[0]))

分手：

由于我们只想对字符串应用该函数，因此我们删除之前存在的所有NaN值，否则我们将得到TypeError，因为列中存在浮点数和字符串元素抱怨浮动没有方法为len。

df[1].str.strip().dropna()

0    tes  t    # operates only on indices where strings are present(empty strings included)
2     test1
3          
Name: 1, dtype: object

重建索引部分不是必需的步骤，因为它只计算字符串所在的索引。

此外，reset_index(drop=True)部分确实不需要，因为groupby对象返回fillna之后的一个系列，可以将其分配回第1列。

Python填充字符串列＆＃34;转发＆＃34;和groupby将groupby附加到dataframe

1 个答案: