Question

我有大约50个文本文件，我想打开然后执行一些操作，然后将输出保存到新文件。因此，对于这些文本文件中的一个，此代码可以满足我的需求：

#open file
df=pd.read_csv(r'F:\Sheyenne\Statistics\NDVI_allotment\Text\A_Annex.txt', sep='\t', nrows=80, skiprows=2)

#replace value names in 'Basic Stats'
df=df.replace({'Band 80$': 'LT50300281984137PAC00',
'Band 79$': 'LT50300281984185XXX15',
'Band 78$': 'LT50300821984249XXX03',
'Band 77$': 'LT50300281985139PAC12',
'Band 76$': 'LT50300281985171PAC04',
'Band 75$': 'LT50300281986206XXX03',
'Band 74$': 'LT50300281986238XXX03',
'Band 73$': 'LT50300281987241XXX04',
'Band 72$': 'LT50300281987257XXX03',
'Band 71$': 'LT50300281987273XXX05',
'Band 70$': 'LT50300281988212XXX03'}, regex=True)

#take a slice of the data
df['Basic Stats']=df['Basic Stats'].str.slice(13,20)

#sort the data
df=df.sort(columns='Basic Stats', axis=0, ascending=True)

我需要对所有50个文件执行完全相同的操作，有没有办法在pandas中执行此操作？即使是非熊猫的答案也会有所帮助。

编辑：

该文件的前1000个字符的片段：

'Filename: F:\\Sheyenne\\Atmospherically Corrected Landsat\\Indices\\Main\\NDVI\\NDVI_stack\nROI: EVF: Layer: Main_allotments.shp (allotment1=A. Annex) [White] 3984 points\n\nBasic Stats\t      Min\t     Max\t    Mean\t   Stdev\t  Num\tEigenvalue\n     Band 1\t 0.428944\t0.843916\t0.689923\t0.052534\t    1\t  0.229509\n     Band 2\t-0.000000\t0.689320\t0.513170\t0.048885\t    2\t  0.119217\n     Band 3\t 0.336438\t0.743478\t0.592622\t0.052544\t    3\t  0.059111\n     Band 4\t 0.313259\t0.678561\t0.525667\t0.048047\t    4\t  0.051338\n     Band 5\t 0.374522\t0.746828\t0.583513\t0.055989\t    5\t  0.027913\n     Band 6\t-0.000000\t0.749325\t0.330068\t0.314351\t    6\t  0.022561\n     Band 7\t-0.000000\t0.819288\t0.600136\t0.170060\t    7\t  0.018126\n     Band 8\t-0.000000\t0.687823\t0.450559\t0.084678\t    8\t  0.012942\n     Band 9\t 0.332637\t0.776398\t0.549870\t0.085212\t    9\t  0.009261\n    Band 10\t 0.386589\t0.848977\t0.635024\t0.087712\t   10\t  0.006628\n    Band 11\t 0.265165\t0.822361\t0.594286\t0.075730\t   11\t  0.004517\n    Band 12\t 0.191882\t0.539559\t0.343836\t0.0'

编辑：

此代码：

d={'Band 80$': 'LT50300281984137PAC00',
'Band 79$': 'LT50300281984185XXX15',
'Band 78$': 'LT50300821984249XXX03',
'Band 77$': 'LT50300281985139PAC12',
'Band 76$': 'LT50300281985171PAC04',
'Band 75$': 'LT50300281986206XXX03',
'Band 74$': 'LT50300281986238XXX03',
'Band 73$': 'LT50300281987241XXX04',
'Band 72$': 'LT50300281987257XXX03',
'Band 71$': 'LT50300281987273XXX05',
'Band 70$': 'LT50300281988212XXX03'}

pth = r'F:\Sheyenne\Statistics\NDVI_allotment\Text' # path to files
new = os.path.join(pth,"new") 
os.mkdir(new)  # create new dir for new files
os.chdir(new) # change to that directory
# loop over each file and update
for f in os.listdir(pth):
    df = pd.read_csv(os.path.join(pth, f), sep='\t', nrows=80, skiprows=2)
    df = df.replace(d)
    df['Basic Stats'] = df['Basic Stats'].str.slice(13,20)
    df.sort(columns='Basic Stats', axis=0, ascending=True, inplace=True)
    # save data to csv
    df.to_csv(os.path.join(new, "new_{}".format(f)), index=False, sep="\t")
print 'Done Processing'

返回：

IOError: Initializing from file failed

Answer 1

我将你在函数中包含的内容包装起来，并将文件名作为函数的参数。然后，您可以在循环中调用该函数来处理每个文件。这不是特定于熊猫的，但它应该有效。

如果要处理的所有文件都在一个目录中，您可以使用此answer获取文件列表。

from os import listdir
from os.path import isfile, join

mypath = 'the directory name here'
filenames = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]

def process_file(filename):
    df=pd.read_csv(filename, sep='\t', nrows=80, skiprows=2)
    # Rest of code goes here...

for filename in filenames:
    process_file(filename)

Answer 2

 d = {'Basic Stats':{'Band 80$': 'LT50300281984137PAC00',
 'Band 79': 'LT50300281984185XXX15',
 'Band 78': 'LT50300821984249XXX03',
 'Band 77': 'LT50300281985139PAC12',
 'Band 76': 'LT50300281985171PAC04',
 'Band 75': 'LT50300281986206XXX03',
 'Band 74': 'LT50300281986238XXX03',
 'Band 73': 'LT50300281987241XXX04',
 'Band 71': 'LT50300281987273XXX05',
 'Band 70': 'LT50300281988212XXX03'}}


pth = r'F:\Sheyenne\Statistics\NDVI_allotment\Text' # path to files
new = os.path.join(pth,"new") 
os.mkdir(new)  # create new dir for new files
# loop over each file and update
for f in os.listdir(pth):
    df = pd.read_csv(os.path.join(pth, f), sep='\t', nrows=80, skiprows=2)
    df = df.replace(d)
    df['Basic Stats'] = df['Basic Stats'].str.slice(13,20)
    df.sort(columns='Basic Stats', axis=0, ascending=True, inplace=True)
    # save data to csv
    df.to_csv(os.path.join(new, "new_{}".format(f)), index=False, sep="\t")

一个没有意义的部分是替换dict中的值然后切掉一些字符串，使用正确的值开始更有意义。另一个问题是，如果df['Basic Stats'] = df['Basic Stats'].str.slice(13,20)没有匹配，那么从13:20开始切片将为您留下一个空字符串，因此您应该确保每行肯定会匹配，否则您将最终丢失数据

在多个txt文件上打开并执行相同的操作

2 个答案: