Question

使用apply时，我很难让熊猫返回多列。

示例：

import pandas as pd
import numpy as np
np.random.seed(1)

df = pd.DataFrame(index=range(2), columns=['a', 'b'])
df.loc[0] = [np.array((1,2,3))], 1
df.loc[1] = [np.array((4,5,6))], 1
df

             a  b
0  [[1, 2, 3]]  1
1  [[4, 5, 6]]  1

df2 = np.random.randint(1,9, size=(3,2))
df2

array([[4, 6],
       [8, 1],
       [1, 2]])

def example(x):
    return np.transpose(df2) @ x[0]

df3 = df['a'].apply(example)
df3

0    [23, 14]
1    [62, 41]

我希望df3具有两列，每行每列中每个元素一个元素，而不是一列，每行中两个元素都有一个元素。

所以我想要类似的东西

df3Wanted
         col1  col2
    0    23    14
    1    62    41

有人知道如何解决此问题吗？

Answer 1

要实现此目标，需要进行更改：

更新以下功能如下

void main

并在def example(x): return [np.transpose(df2) @ x[0]]

上执行以下操作

df3

wantedDF3 = pd.concat(df3.apply(pd.DataFrame, columns=['col1','col2']).tolist())提供所需的输出：

print(wantedDF3)

编辑：避免内存错误问题的另一种方法是：保持col1 col2 0 40 12 0 97 33函数和example不变（与问题相同）现在，最重要的是，使用下面的代码生成df3

wantedDF3

Answer 2

这是对第一个答案的注释的答案，涉及内存错误的问题。下面的示例使用的数据会导致计算机上出现内存错误，并提供了迄今为止建议的所有方法（第一个答案和第一个答案中的注释），但可与以下代码一起使用：

regions.vec <- c("Northeast", "Midwest", "South", "West")
regions <- birth_data[, regions.vec]

# for one row, use the binary vector row to select from regions.vec
process.row <- function(row) regions.vec[row]

# go through entire regions subdataframe and do this row by row
result <- list()
for (i in 1:dim(regions[1])) {
  result[[i]] <- process.row(regions[i, ])
}

# flatten the result list and add it to the rows of birth_data
birth_data$region <- unlist(result)

熊猫每行而不是列表应用多列

2 个答案: