将Pandas与listdir()一起使用或将其转置插入DataFrame

时间:2018-09-07 16:23:49

标签: pandas-map

简而言之,我有一个DataFrame,其中包含指向文件夹路径的单列,并且我想检查每个文件夹是否存在(返回“状态”)和其中包含的文件“计数”。然后将“状态”和“计数”字段添加到数据框中。

我尝试了两种方法(请参阅带有***的代码注释)。

  1. 下面编写的代码未使用“ map”方法,并导致TypeError。
  2. 执行带有“地图”功能未注释(和其他注释掉)的相同代码,但是插入转置的状态/计数值。

下面是创建错误的完整示例代码:

from pandas import DataFrame
import os

#create a couple folders/files, if not already existing
try:
    os.mkdir(r'testfolder1')
    with open(r'testfolder1\testfile1.txt','w') as f1:
        f1.write('test text.')
    os.mkdir(r'testfolder2')
    with open(r'testfolder2\testfile2.txt','w') as f1:
        f1.write('test text.')
    with open(r'testfolder2\testfile3.txt','w') as f1:
        f1.write('test text.')
except FileExistsError:
    print('Folders and/or files already exist.')



class myclass:

    def __init__(self):
        self.df = DataFrame(['testfolder1','testfolder2'],columns=['path'])

    def repo_check(self):
        """gather status and size of each repo path"""

        def dir_check(path):
            '''checks for existence of directory and count of items within'''
            try:
                itemcount = len(os.listdir(path))
                result = ['found', itemcount]
            except FileNotFoundError:
                result = ['not found', 0]
            return result

        #Add status of each repo and count of contents to df
        #self.df['status'], self.df['count'] = self.df.path.map(dir_check)     #***ALTERNATE APPROACH - RESULTS IN "TRANSPOSED" INSERTION INTO DATAFRAME***
        self.df['status'], self.df['count'] = dir_check(self.df.path)
        print(self.df)      #***RESULTS IN TYPEERROR***


data = myclass()                #instantiate class
print('Created dataframe:')
print(data.df)                  #print original dataframe

print()
print('New dataframe:')
data.repo_check()               #calculates and adds calculated columns to df
print(data.df)                  #print new dataframe with calculated columns

方法1产生以下输出:

Created dataframe:
          path
0  testfolder1
1  testfolder2

New dataframe:
Traceback (most recent call last):
  File "C:/Users/B1457080/Documents/Python/_misc/pandas_map_test/pandas_map_test2.py", line 56, in <module>
    data.repo_check()               #calculates and adds calculated columns to df
  File "C:/Users/B1457080/Documents/Python/_misc/pandas_map_test/pandas_map_test2.py", line 46, in repo_check
    self.df['status'], self.df['count'] = dir_check(self.df.path)
  File "C:/Users/B1457080/Documents/Python/_misc/pandas_map_test/pandas_map_test2.py", line 38, in dir_check
    itemcount = len(os.listdir(path))
TypeError: listdir: path should be string, bytes, os.PathLike or None, not Series

Process finished with exit code 1

方法2产生以下输出(状态/计数值转换):

Created dataframe:
          path
0  testfolder1
1  testfolder2

New dataframe:
          path status  count
0  testfolder1  found  found
1  testfolder2      1      2
          path status  count
0  testfolder1  found  found
1  testfolder2      1      2

Process finished with exit code 0

奇怪的是,如果我执行结构类似的操作,但使用基于数学的函数而不是os.listdir()方法,则它不会转置插入的值:

from pandas import DataFrame

class myclass:

    def __init__(self):
        self.df = DataFrame([[1,2,3],[4,5,6]],columns=['alpha','beta','gamma'])

    def test_classmethod(self):
        def test_function(x):
            return [x**2, x**.5]        #returns a two-element list to be insterted in the df row

        #Add two columns for computing square and sqrt (both methods below work)
        #self.df['gamma^2'], self.df['gamma^0.5'] = self.df['gamma'].map(test_function)
        self.df['gamma^2'], self.df['gamma^0.5'] = test_function(self.df['gamma'])


data = myclass()                #instantiate class
print('Created dataframe:')
print(data.df)                  #print original dataframe

print()
print('New dataframe:')
data.test_classmethod()         #calculates and adds calculated columns to df
print(data.df)                  #print new dataframe with calculated columns

这将产生:

Created dataframe:
   alpha  beta  gamma
0      1     2      3
1      4     5      6

New dataframe:
   alpha  beta  gamma  gamma^2  gamma^0.5
0      1     2      3        9   1.732051
1      4     5      6       36   2.449490

关于如何使方法1或方法2按预期工作的任何建议将不胜感激!

0 个答案:

没有答案