如何忽略文件夹中的文件夹并读取测试数据?

时间:2017-09-30 11:47:23

标签: python

我在路径C:/Users/admin/Downloads/aclImdb/train/unsup中有unsup文件夹。 train文件夹由negposunsup组成,每个文件夹包含12500个.txt个文件。现在的问题是我想要从文件中排除unsup文件夹并将负数据和正数据存储在数据集中。以下是我的代码:

train=[]

exclude =("C:/Users/admin/Downloads/aclImdb/train/unsup")

dirs[:] = [d for d in dirs if d not in exclude]
for root, dirs, files in os.walk(directory, topdown=True):

    for subdir, dirs, files in os.walk(directory):
        for dirs[:] in dirs:
            for file in files:
                if file.endswith("txt"):
                    with open(os.path.join(subdir,file),'r+',encoding="utf8") as data2:
                        train.append(data2.read())

结果如下:

TypeError                                 Traceback (most recent call last)
<ipython-input-85-4dce3931638b> in <module>()
      7                 if file.endswith("txt"):
      8                     with open(os.path.join(subdir,file),'r+',encoding="utf8") as data2:
----> 9                         train.append(data2.read())
     10
     11

c:\users\varavoorgp\anaconda3\lib\site-packages\pandas\core\frame.py in append(self, other, ignore_index, verify_integrity)
   4433             to_concat = [self, other]
   4434         return concat(to_concat, ignore_index=ignore_index,
-> 4435                       verify_integrity=verify_integrity)
   4436
   4437     def join(self, other, on=None, how='left', lsuffix='', rsuffix='',

c:\users\varavoorgp\anaconda3\lib\site-packages\pandas\tools\merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
   1449                        keys=keys, levels=levels, names=names,
   1450                        verify_integrity=verify_integrity,
-> 1451                        copy=copy)
   1452     return op.get_result()
   1453

c:\users\varavoorgp\anaconda3\lib\site-packages\pandas\tools\merge.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
   1506         for obj in objs:
   1507             if not isinstance(obj, NDFrame):
-> 1508                 raise TypeError("cannot concatenate a non-NDFrame object")
   1509
   1510             # consolidate

TypeError: cannot concatenate a non-NDFrame object

顺便说一句,我是python的新手。我也想对这些数据进行词干和化学。

1 个答案:

答案 0 :(得分:0)

我并不完全明白你想要完成什么,而且你的代码有很多问题,但它并没有帮助澄清情况 - 我甚至不想尝试解释多少少解决它所有的问题。

相反,下面的内容显示了如何处理给定collectDataInModalPage_afterEdit的子目录中的所有.txt文件,同时排除其中一个或多个文件的大纲。也许它会有所帮助。

directory