我在特定路径中列出了5个excel文件,如下所述:'Z:\\Ruchika\\Citymax_Dec06\\SVCDs\\**\\*Claypot*.csv'.
5个excel文件的列表和路径如下所示
['Z:\\Ruchika\\Citymax_Dec06\\SVCDs\\December - SVCD\\UAE _ Citymax _Claypot_ Burdubai_fullcampaignfile.csv',
'Z:\\Ruchika\\Citymax_Dec06\\SVCDs\\January2019 - SVCD\\UAE _ Citymax _Claypot_ Burdubai_fullcampaignfile.csv',
'Z:\\Ruchika\\Citymax_Dec06\\SVCDs\\November - SVCD\\UAE _ Citymax _ Claypot_BD_fullcampaignfile.csv',
'Z:\\Ruchika\\Citymax_Dec06\\SVCDs\\October - SVCD\\UAE _ Citymax _Claypot_ Burdubai_fullcampaignfile.csv',
'Z:\\Ruchika\\Citymax_Dec06\\SVCDs\\sept - svcd\\UAE _ Claypot _ Burdubai_fullcampaignfile.csv']
现在,我正尝试从每个excel文件名中检索月份名称,并按照以下代码将其添加到我的数据框中,但是由于我只能检索11月月份,这让我感到震惊,这是不正确的。请帮助我
m=['November','December','October','September','August']
def extract(folderpath):
final=glob.glob(folderpath)
frames = []
for file in final:
j=0
df = pd.read_csv(file, error_bad_lines=False)
df['Month']=m[j]
frames.append(df)
j=j+1
mergedfile = pd.concat(frames)
return mergedfile
a=extract('Z:\\Ruchika\\Citymax_Dec06\\SVCDs\\**\\*Claypot*.csv')
Input : a.shape
Ouput : (3232487, 31)
Input : a['Month'].value_counts()
Output : November 3232487
Name: Month, dtype: int64
答案 0 :(得分:1)
我猜可能是几个月,所以为什么不检查几个月呢?
filename = r'Z:\Ruchika\Citymax_Dec06\SVCDs\December - SVCD\UAE _ Citymax Claypot Burdubai_fullcampaignfile.csv'
for month in ['October', 'November', 'December']: # List of months
if month in filename:
print('Month is:', month)
答案 1 :(得分:1)
month = [x for x in month_list if x in my_filename][0]
my_df['month'] = month
答案 2 :(得分:1)
您可以将str.split
与pd.DataFrame.assign
结合使用:
file_path = r'Z:\Ruchika\Citymax_Dec06\SVCDs\December - SVCD\UAE _ Citymax Claypot Burdubai_fullcampaignfile.csv'
file_month = file_path.rsplit('\\', 2)[1].split(' - ')[0] # December
df = pd.read_csv(file_path).assign(Month=file_month)