动态命名已编辑的csv,以包含旧csv名称的一部分

时间:2013-10-28 01:26:36

标签: python python-2.7 csv pandas

我使用一些代码来合并两个csv并按两列排序。输出一个新的csv。 输入csvs的名称相同,编号为1&我正在为多组数据重复这段代码。我想知道什么方法可以使代码输出一个包含原始文件名的第一部分的文件名。

我目前的代码:

import pandas as pd

df1 = pd.read_csv("data csv 1\September 2013 1 UUedit1.csv", delimiter = ",")
df2 = pd.read_csv("data csv 1\September 2013 2 UUedit2.csv", delimiter = ",")
merged = df1.merge(df2, on="Unique Element")
delcols = "Element_y", "number_y", "date_y", "title_y", "name_y"

for delcol in delcols:
    del merged[delcol]

merged.rename(columns={"name_x": "name", "rdate_x": "date", "title_x": "title", "number_x": "number", "Element_x": "Element"}, inplace = True)
merged = merged.sort("Element").reset_index(drop=True)
merged = merged.sort("date").reset_index(drop=True)
merged.to_csv("MRG.csv", index=False, sep = ",")

因此,在此示例中,两个输入文件都被称为September 2013" number"" UUedit"我想让我的代码输出文件名直接为September 2013 MRG.csv如何编码?要澄清两个原始文件是October 2013,那么输出将是October 2013 MRG.csv 非常感谢GTPE

修改

运行Christian Ternus提供的代码 我收到了以下打印和追溯:

Usage: C:/Test.py <month> <year>
Traceback (most recent call last):
  File "C:/Test.py", line 7, in <module>
    month, year = sys.argv[1:]
ValueError: need more than 0 values to unpack

我不确定第二个变量应该设置为什么 非常感谢 GTPE

编辑2

我设法通过调用CMD来获取代码,但是我通过python调用脚本的尝试似乎没有用。我尝试了下面的内容:

import subprocess
p = subprocess.Popen(['python', 'RawDataSheetMergerPandasTest.py September 2013'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
print out

2 个答案:

答案 0 :(得分:5)

以下是给出当月名称的下个月名称:

import calendar
nextmonth = calendar.month_name[1:][(calendar.month_name[1:].index(month) + 1) % 12]

以下是同样的逻辑应用于您的脚本,还有一些其他改进:)运行此脚本为“./myscript.py somemonth someyear”。它会输出一个名为nextmonth year MRG.csv的CSV文件,甚至会考虑本地化和正确包装年份。

import pandas as pd
import calendar
import sys

if len(sys.argv) != 3:
    print "Usage: {0} <month> <year>".format(sys.argv[0])
month, year = sys.argv[1:]

if not month in calendar.month_name:
    print "Invalid month! Month must be one of:{0}".format(str(calendar.month_name))
if not year.isdigit():
    print "Invalid year! Year must be a number."

nextmonth = calendar.month_name[1:][(calendar.month_name[1:].index(month) + 1) % 12]

df1 = pd.read_csv("data csv 1\{0} {1} 1 UUedit1.csv".format(month, year), delimiter = ",")
df2 = pd.read_csv("data csv 1\{0} {1} 2 UUedit2.csv".format(month, year), delimiter = ",")
merged = df1.merge(df2, on="Unique Element")
delcols = "Element_y", "number_y", "date_y", "title_y", "name_y"

for delcol in delcols:
    del merged[delcol]

merged.rename(columns={"name_x": "name", "rdate_x": "date", "title_x": "title", "number_x": "number", "Element_x": "Element"}, inplace = True)
merged = merged.sort("Element").reset_index(drop=True)
merged = merged.sort("date").reset_index(drop=True)

if month == calendar.month_name[-1]: year = str(int(year + 1))

merged.to_csv("{0} {1} MRG.csv".format(nextmonth, year), index=False, sep = ",")

如果你不需要下个月的功能(听起来你实际上没有),请取出以下两行:

nextmonth = calendar.month_name[1:][(calendar.month_name[1:].index(month) + 1) % 12]
[...]
if month == calendar.month_name[-1]: year = str(int(year + 1))

并将最后一行替换为:

merged.to_csv("{0} {1} MRG.csv".format(month, year), index=False, sep = ",")

答案 1 :(得分:0)

您可以使用内置的os.path.commonprefix函数接受任意数量的输入文件:

import os

filenames = ['data csv 1\September 2013 1 UUedit1.csv',
             'data csv 1\September 2013 2 UUedit2.csv',]

merged_filename = os.path.commonprefix(filenames).rstrip(' ') + ' MRG.csv'
print repr(merged_filename)  # --> 'data csv 1\September 2013 MRG.csv'