Question

我的pandas数据框包含一个列“文件”，它是带有文件路径的字符串。我试图使用dfply来改变这个列，如

resultstatsDF.reset_index() >> mutate(dirfile = os.path.join(os.path.basename(os.path.dirname(X.file)),os.path.basename(X.file)))

但我收到了错误

TypeError: __index__ returned non-int (type Call)

我做错了什么？我该怎么做？

Answer 1

由于我的问题已被投票，我猜，有些人仍然感兴趣。到目前为止，我已经在Python中学到了很多东西，让我回答一下，也许它会对其他用户有所帮助。

首先，让我们导入所需的包

import pandas as pd
from dfply import *
from os.path import basename, dirname, join

并制作所需的pandas DataFrame

resultstatsDF = pd.DataFrame({'file': ['/home/user/this/file1.png', '/home/user/that/file2.png']})

是

                        file
0  /home/user/this/file1.png
1  /home/user/that/file2.png

我们发现我们仍然遇到错误（尽管由于dfply的不断发展而改变了）：

resultstatsDF.reset_index() >> \
mutate(dirfile = join(basename(dirname(X.file)), basename(X.file)))

TypeError： index 返回非int（类型意图）

原因是，因为mutate适用于系列，但我们需要一个处理元素的函数。在这里，我们可以使用pandas函数pandas.Series.apply，它适用于系列。但是，我们还需要一个自定义函数，我们可以在系列file的每个元素上应用它。一切都放在一起我们最终得到了代码

def extract_last_dir_plus_filename(series_element):
    return join(basename(dirname(series_element)), basename(series_element))

resultstatsDF.reset_index() >> \
mutate(dirfile = X.file.apply(extract_last_dir_plus_filename))

输出

   index                       file         dirfile
0      0  /home/user/this/file1.png  this/file1.png
1      1  /home/user/that/file2.png  that/file2.png

在没有dfply的mutate的情况下这样做，我们可以另外写

resultstatsDF['dirfile'] = resultstatsDF.file.apply(extract_last_dir_plus_filename)

dfply：Mutating string column：TypeError

1 个答案: