简而言之,我有一个DataFrame,其中包含指向文件夹路径的单列,并且我想检查每个文件夹是否存在(返回“状态”)和其中包含的文件“计数”。然后将“状态”和“计数”字段添加到数据框中。
我尝试了两种方法(请参阅带有***的代码注释)。
下面是创建错误的完整示例代码:
from pandas import DataFrame
import os
#create a couple folders/files, if not already existing
try:
os.mkdir(r'testfolder1')
with open(r'testfolder1\testfile1.txt','w') as f1:
f1.write('test text.')
os.mkdir(r'testfolder2')
with open(r'testfolder2\testfile2.txt','w') as f1:
f1.write('test text.')
with open(r'testfolder2\testfile3.txt','w') as f1:
f1.write('test text.')
except FileExistsError:
print('Folders and/or files already exist.')
class myclass:
def __init__(self):
self.df = DataFrame(['testfolder1','testfolder2'],columns=['path'])
def repo_check(self):
"""gather status and size of each repo path"""
def dir_check(path):
'''checks for existence of directory and count of items within'''
try:
itemcount = len(os.listdir(path))
result = ['found', itemcount]
except FileNotFoundError:
result = ['not found', 0]
return result
#Add status of each repo and count of contents to df
#self.df['status'], self.df['count'] = self.df.path.map(dir_check) #***ALTERNATE APPROACH - RESULTS IN "TRANSPOSED" INSERTION INTO DATAFRAME***
self.df['status'], self.df['count'] = dir_check(self.df.path)
print(self.df) #***RESULTS IN TYPEERROR***
data = myclass() #instantiate class
print('Created dataframe:')
print(data.df) #print original dataframe
print()
print('New dataframe:')
data.repo_check() #calculates and adds calculated columns to df
print(data.df) #print new dataframe with calculated columns
方法1产生以下输出:
Created dataframe:
path
0 testfolder1
1 testfolder2
New dataframe:
Traceback (most recent call last):
File "C:/Users/B1457080/Documents/Python/_misc/pandas_map_test/pandas_map_test2.py", line 56, in <module>
data.repo_check() #calculates and adds calculated columns to df
File "C:/Users/B1457080/Documents/Python/_misc/pandas_map_test/pandas_map_test2.py", line 46, in repo_check
self.df['status'], self.df['count'] = dir_check(self.df.path)
File "C:/Users/B1457080/Documents/Python/_misc/pandas_map_test/pandas_map_test2.py", line 38, in dir_check
itemcount = len(os.listdir(path))
TypeError: listdir: path should be string, bytes, os.PathLike or None, not Series
Process finished with exit code 1
方法2产生以下输出(状态/计数值转换):
Created dataframe:
path
0 testfolder1
1 testfolder2
New dataframe:
path status count
0 testfolder1 found found
1 testfolder2 1 2
path status count
0 testfolder1 found found
1 testfolder2 1 2
Process finished with exit code 0
奇怪的是,如果我执行结构类似的操作,但使用基于数学的函数而不是os.listdir()方法,则它不会转置插入的值:
from pandas import DataFrame
class myclass:
def __init__(self):
self.df = DataFrame([[1,2,3],[4,5,6]],columns=['alpha','beta','gamma'])
def test_classmethod(self):
def test_function(x):
return [x**2, x**.5] #returns a two-element list to be insterted in the df row
#Add two columns for computing square and sqrt (both methods below work)
#self.df['gamma^2'], self.df['gamma^0.5'] = self.df['gamma'].map(test_function)
self.df['gamma^2'], self.df['gamma^0.5'] = test_function(self.df['gamma'])
data = myclass() #instantiate class
print('Created dataframe:')
print(data.df) #print original dataframe
print()
print('New dataframe:')
data.test_classmethod() #calculates and adds calculated columns to df
print(data.df) #print new dataframe with calculated columns
这将产生:
Created dataframe:
alpha beta gamma
0 1 2 3
1 4 5 6
New dataframe:
alpha beta gamma gamma^2 gamma^0.5
0 1 2 3 9 1.732051
1 4 5 6 36 2.449490
关于如何使方法1或方法2按预期工作的任何建议将不胜感激!