Question

我有一个小问题，操纵数据帧来创建其他人的新变量函数。

我能够计算它，但不能将其重新聚集到原始数据帧。

我有test数据框和new_column

test = pd.DataFrame({'name': ["john", "jack", "albert"],
                         'day': ["2018-01-01", "2018-01-02", "2018-01-03"],
                         'result': ['c("7", "6", "")', 'c("3", "6", "10")', 'c("4", "3", "")']})

def update_result(row, x):
    return row[x].replace("c(", "").replace(")","").replace("\"","").replace(" ","").split(",")

new_column=test.apply(lambda row: update_result(row,2),axis=1)

但是当我尝试将new_column添加到data_frame时，我收到了有关操作副本的错误消息。你知道这个专栏的正确方法是什么吗？

test['result2']=new_column

我得到了：

ValueError: Wrong number of items passed 3, placement implies 1

和

 # check if we are modifying a copy

感谢您的帮助。

Answer 1

如果要将函数应用于特定列，可以尝试这种方式：

test['result2']=test['result'].apply(lambda row: row.replace("c(", "").replace(")","").replace("\"","").replace(" ","").split(","))

Out[5]:
          day    name             result     result2
0  2018-01-01    john    c("7", "6", "")    [7, 6, ]
1  2018-01-02    jack  c("3", "6", "10")  [3, 6, 10]
2  2018-01-03  albert    c("4", "3", "")    [4, 3, ]

如果出现SettingWithCopyWarning警告，您可以尝试按建议设置或更新列：

new_col=test['result'].apply(lambda row: row.replace("c(", "").replace(")","").replace("\"","").replace(" ","").split(","))
test.loc[:, 'result2'] = new_col

loc 命令需要指定要选择的行（：表示所有行）以及哪个列（ result2 是要创建的列，或者如果要更新现有的列，例如 result ，也可以。）

您也可以查看此页面，此主题已得到很好的解释here。

Answer 2

不需要apply这是一个循环。考虑使用向量化Series.str方法直接分配列。此外，您可以在str.replace中使用正则表达式来提取所有数字和逗号，并避免长链接。

test['res1'] = test['result'].str.replace("c\(", "")\
                             .str.replace("\)","").str.replace("\"","")\
                             .str.replace(" ","").str.split(",")

test['res2'] = test['result'].str.replace(r'[^0-9,]', '').str.split(",")
print(test)

#           day    name             result        res1        res2
# 0  2018-01-01    john    c("7", "6", "")    [7, 6, ]    [7, 6, ]
# 1  2018-01-02    jack  c("3", "6", "10")  [3, 6, 10]  [3, 6, 10]
# 2  2018-01-03  albert    c("4", "3", "")    [4, 3, ]    [4, 3, ]

Answer 3

分配操作无法正常工作，因为您要将一个DataFrame分配给一个列（期待一个系列）：

请尝试询问特定列：

new_column=test['result'].apply(lambda row: row.replace("c(", "").replace(")","").replace("\"","").replace(" ","").split(","))    
test['result2']=new_column

P.S。只是看到别人在我面前发布了相同的答案，但无论如何我还是把它放在这里。

从.apply向数据框添加列

3 个答案: