Pandas Apply(axis = 1):产生多行

时间:2015-04-22 02:12:04

标签: python pandas

我有一个我想按行应用的功能:

def item_split(row):
    items = json.loads(row['items'])
    out = pd.DataFrame([row for i in range(len(items))])
    out['item'] = items
    return out

tweets = tweets.apply(tag_split, axis=1)

正如您所知,此函数用于获取项目列表,并为每个项目创建一行,以复制其余剩余数据。不幸的是,我当前的方法不是apply方法的正确用法:

ValueError                                Traceback (most recent call last)
/usr/lib/python3.4/site-packages/pandas/core/common.py in _asarray_tuplesafe(values, dtype)
   2344                 result = np.empty(len(values), dtype=object)
-> 2345                 result[:] = values
   2346             except ValueError:

ValueError: could not broadcast input array from shape (13) into shape (1)

有谁知道如何正确地做到这一点?我有点难过。

1 个答案:

答案 0 :(得分:1)

这个问题与Wes McKinney pandas: apply function to DataFrame that can return multiple rowshas answered类似。

说你的数据是这样的:

In [36]: tweets = pd.DataFrame({
   ....:     'items': [
   ....:         '[{"text": "user1-msg1"},{"text": "user1-msg2"},{"text": "user1-msg3"}]',
   ....:         '[{"text": "user2-msg1"},{"text": "user2-msg2"}]',
   ....:         '[{"text": "user3-msg1"}]',
   ....:     ],
   ....:     'user': ['user1', 'user2', 'user3'],
   ....: })

您可以.groupby()group_keys=False一起使用,为每个分组项目返回多行:

In [37]: def item_split(group):
   ....:         row = group.irow(0)
   ....:         result = pd.DataFrame(json.loads(row['items']))
   ....:         result['user'] = row['user']
   ....:         return result
   ....:

In [38]: tweets.groupby('items', group_keys=False).apply(item_split)
Out[38]:
         text   user
0  user1-msg1  user1
1  user1-msg2  user1
2  user1-msg3  user1
0  user2-msg1  user2
1  user2-msg2  user2
0  user3-msg1  user3