选择某些行值并将它们设置为pandas中的列

时间:2016-09-16 16:49:28

标签: python pandas

我有一个如下所示的数据集:

+-------------------------+-------------+------+--------+-------------+--------+--+
|                         | impressions | name | shares | video_views |  diff  |  |
+-------------------------+-------------+------+--------+-------------+--------+--+
| _ts                     |             |      |        |             |        |  |
| 2016-09-12 23:15:04.120 |           1 | Vidz |      7 |       10318 | 15mins |  |
| 2016-09-12 23:16:45.869 |           2 | Vidz |      7 |       10318 | 16mins |  |
| 2016-09-12 23:30:03.129 |           3 | Vidz |     18 |       29291 | 30mins |  |
| 2016-09-12 23:32:08.317 |           4 | Vidz |     18 |       29291 | 32mins |  |
+-------------------------+-------------+------+--------+-------------+--------+--+

我正在尝试构建一个数据框以提供给回归模型,并且我想将特定行解析为功能。为此,我希望数据框类似于

+-------------------------+------+--------------+-------------------+-------------------+--------------+-------------------+-------------------+
|                         | name | 15min_shares | 15min_impressions | 15min_video_views | 30min_shares | 30min_impressions | 30min_video_views |
+-------------------------+------+--------------+-------------------+-------------------+--------------+-------------------+-------------------+
| _ts                     |      |              |                   |                   |              |                   |                   |
| 2016-09-12 23:15:04.120 | Vidz |            7 |                 1 |             10318 |           18 |                 3 |             29291 |
+-------------------------+------+--------------+-------------------+-------------------+--------------+-------------------+-------------------+

最好的方法是什么?我认为如果我只想选择1行(15分钟),只需解析出不需要的行并转动就会更容易。

但是,我需要15分钟和30分钟的功能,并且不确定如何继续需要这些列

2 个答案:

答案 0 :(得分:2)

您可以将DF的子集包含15分钟和30分钟的行,并通过将第一行(15分钟)的NaN值重新填充到下一行(30分钟)并将其删除来连接它们下一行(30分钟)如图所示:

prefix_15="15mins"
prefix_30="30mins"

fifteen_mins = (df['diff']==prefix_15)
thirty_mins = (df['diff']==prefix_30)

df = df[fifteen_mins|thirty_mins].drop(['diff'], axis=1)

df_ = pd.concat([df[fifteen_mins].add_prefix(prefix_15+'_'),          \
                 df[thirty_mins].add_prefix(prefix_30+'_')], axis=1)   \
                .fillna(method='bfill').dropna(how='any')

del(df_['30mins_name'])
df_.rename(columns={'15mins_name':'name'}, inplace=True)
df_

Image

答案 1 :(得分:0)

堆叠以旋转和折叠列

df1 = df.set_index('diff', append=True).stack().unstack(0).T
df1.columns = df1.columns.map('_'.join)

仅查看第一行

df1.iloc[[0]].dropna(1)

enter image description here