我的数据框看起来像这样:
d = {'Col_1' : pd.Series(['A', 'A', 'A', 'B']),
'Col_2' : pd.Series(['B', 'C', 'B', 'D']),
'Col_3' : pd.Series([np.nan, 'D', 'C', np.nan]),
'Col_4' : pd.Series([np.nan, np.nan, 'D', np.nan]),
'Col_5' : pd.Series([np.nan, np.nan, 'E', np.nan]),}
df = pd.DataFrame(d)
Col_1 Col_2 Col_3 Col_4 Col_5
A B NaN NaN NaN
A C D NaN NaN
A B C D E
B D NaN NaN NaN
我的目标是最终得到以下内容:
Col_1 Col_2 Col_3 Col_4 Col_5 ConCat
A B NaN NaN NaN A:B
A C D NaN NaN A:C:D
A B C D E A:B:C:D:E
B D NaN NaN NaN B:D
我已成功创建了一个类似于所需输出的数据框:
rows = df.values
df_1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])
0
0 A:B
1 A:C:D
2 A:B:C:D:E
3 B:D
但是现在当我尝试将其放入原始数据帧时,我得到:
df['concatenated'] = df_1
Col_1 Col_2 Col_3 Col_4 Col_5 concatenated
A B NaN NaN NaN NaN
A C D NaN NaN NaN
A B C D E NaN
B D NaN NaN NaN NaN
奇怪的是,在创建简化示例时,它按预期工作。下面,如果我正在做的完整代码。原始数据来自我上面原始数据框的转换。
df_caregiver_type = pd.concat([df_caregiver_type[col].order().reset_index(drop=True) for col in df_caregiver_type], axis=1, ignore_index=False).T
df_caregiver_type.rename(columns=lambda x: 'Col_' + str(x), inplace=True)
rows = df_caregiver_type.values
df_caregiver_type1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])
df_caregiver_type['concatenated'] = df_caregiver_type1
df_caregiver_type = df_caregiver_type.T
df_caregiver_type
更新 由于完整代码的第一行,我想我得到了一个错误。这是一个单独但相关的问题:pandas: sort each column individually
答案 0 :(得分:9)
对于您的完整数据集,将df['concatenated'] = df_1
的最后一步更改为df['concatenated'] = df_1.values
将解决问题,我认为这是一个错误,我非常确定我之前已经看过它。
或者只是:df['concatenated'] = [':'.join(word for word in row if word is not np.nan) for row in rows]
答案 1 :(得分:1)
>>> d = {'Col_1' : pd.Series(['A', 'A', 'A', 'B']),
... 'Col_2' : pd.Series(['B', 'C', 'B', 'D']),
... 'Col_3' : pd.Series([np.nan, 'D', 'C', np.nan]),
... 'Col_4' : pd.Series([np.nan, np.nan, 'D', np.nan]),
... 'Col_5' : pd.Series([np.nan, np.nan, 'E', np.nan]),}
>>> df = pd.DataFrame(d)
>>>
>>> rows = df.values
>>> df_1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])
>>>
>>> df['concatenated'] = df_1[0]
>>> df
Col_1 Col_2 Col_3 Col_4 Col_5 concatenated
0 A B NaN NaN NaN A:B
1 A C D NaN NaN A:C:D
2 A B C D E A:B:C:D:E
3 B D NaN NaN NaN B:D
>>>
答案 2 :(得分:0)
>>> df = df.join(df_1)
>>> df = df.rename(columns = {0:'concatenated'})