我在路径C:/Users/admin/Downloads/aclImdb/train/unsup
中有unsup文件夹。 train
文件夹由neg
,pos
和unsup
组成,每个文件夹包含12500个.txt
个文件。现在的问题是我想要从文件中排除unsup
文件夹并将负数据和正数据存储在数据集中。以下是我的代码:
train=[]
exclude =("C:/Users/admin/Downloads/aclImdb/train/unsup")
dirs[:] = [d for d in dirs if d not in exclude]
for root, dirs, files in os.walk(directory, topdown=True):
for subdir, dirs, files in os.walk(directory):
for dirs[:] in dirs:
for file in files:
if file.endswith("txt"):
with open(os.path.join(subdir,file),'r+',encoding="utf8") as data2:
train.append(data2.read())
结果如下:
TypeError Traceback (most recent call last)
<ipython-input-85-4dce3931638b> in <module>()
7 if file.endswith("txt"):
8 with open(os.path.join(subdir,file),'r+',encoding="utf8") as data2:
----> 9 train.append(data2.read())
10
11
c:\users\varavoorgp\anaconda3\lib\site-packages\pandas\core\frame.py in append(self, other, ignore_index, verify_integrity)
4433 to_concat = [self, other]
4434 return concat(to_concat, ignore_index=ignore_index,
-> 4435 verify_integrity=verify_integrity)
4436
4437 def join(self, other, on=None, how='left', lsuffix='', rsuffix='',
c:\users\varavoorgp\anaconda3\lib\site-packages\pandas\tools\merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
1449 keys=keys, levels=levels, names=names,
1450 verify_integrity=verify_integrity,
-> 1451 copy=copy)
1452 return op.get_result()
1453
c:\users\varavoorgp\anaconda3\lib\site-packages\pandas\tools\merge.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
1506 for obj in objs:
1507 if not isinstance(obj, NDFrame):
-> 1508 raise TypeError("cannot concatenate a non-NDFrame object")
1509
1510 # consolidate
TypeError: cannot concatenate a non-NDFrame object
顺便说一句,我是python的新手。我也想对这些数据进行词干和化学。
答案 0 :(得分:0)
我并不完全明白你想要完成什么,而且你的代码有很多问题,但它并没有帮助澄清情况 - 我甚至不想尝试解释多少少解决它所有的问题。
相反,下面的内容显示了如何处理给定collectDataInModalPage_afterEdit
的子目录中的所有.txt
文件,同时排除其中一个或多个文件的大纲。也许它会有所帮助。
directory