Python:将文件名的第一个匹配附加到csv文件中的一行

时间:2016-04-15 17:54:27

标签: python csv python-3.x

为清晰起见而更新:我试图将文件名的第一个匹配值附加到csv文件。我想将fname中的第一个file_label2匹配用于将found值应用于Suggested Label行。使用github3.py从GitHub检索此信息。

在我下面的代码中,我没有收到错误,但我认为这不是完成第一个文件名匹配的正确方法。

从GitHub返回的示例输出:

PR Number: 123
Login: dbs
Files:
files/file-folder/media/figure01
file_label2 = figure01
files/file-folder/jsfile-to-checkin
file_label2 = jsfile
Suggested Label:  Value1
PR Number: 567
Login: dba
Files:
files/file-folder/media/figure01
file_label2 = figure01
files/file-folder/csfile-to-checkin
file_label2 = csfile
Suggested Label:  Value2

所需的csv输出:

PR Number, Login, First File Found, Suggested Label
123,dbs,files/file-folder/jsfile-to-checkin, Value1
567,dba,files/file-folder/csfile-to-checkin, Value2

用于在文件拆分后匹配fname前缀的列表:

list1=["jsfile","csfile"]
list2=["css","html"]

代码:

with open(inputFile,'w') as f:
    for prs in repo.pull_requests():
        getlabels = repo.issue(prs.number).as_dict()

        labels = [labels['name'] for labels in getlabels['labels']]
        tags = ["Bug", "Blocked", "Investigate"]
        enterprisetag = [tagsvalue for tagsvalue in labels if tagsvalue in tags]
        found = "No file match"
        if enterprisetag:
            pass
        else:
            f.write("PR Number:  %s" %getlabels['number'] + '\n' + "Login: %s" %getlabels['user']['login'] + '\n' + "Files: \n")
            for data in repo.pull_request(prs.number).files():
                fname, extname = os.path.splitext(data.filename)
                f.write(fname+'\n')
                file_label = fname.rsplit('/',1)[-1]
                if file_label.count("-") == 1:
                    file_label2 = file_label.split("-")[0]
                    f.write("file_label2: %s" %file_label2 + '\n')
                else:
                    file_label2 = "-".join(file_label.split("-",2)[:2])
                    f.write("file_label2: %s" %file_label2 + '\n')

                if [emlabel for emlabel in list1 if emlabel in file_label2]:
                    found = "Value1"
                    break
                elif [mk_label for mk_label in list2 if mk_label in file_label2]:
                    found = "Value2"
                    break
                else:
                    found = (str(None))

            f.write("Suggested Label: %s" %found + '\n')

prNum, login, firstFileFound, label = None,None,None,None
multiLineFlag = False

with open(outputFile, 'w') as w:
    w.write("PR Number, Login, First File Found, Suggested Label\n")
    for line in open(inputFile):
        line = line.strip()
        if multiLineFlag and not(firstFileFound):
            if line.startswith('file_label') and any(fileType in line for fileType in enterprise_mobility + marketplace + modern_apps + pnp + tdc + tdc_abr + unlock_insights):
                firstFileFound = prevLine
                multiLineFlag = False
            else:
                prevLine = line

        if not multiLineFlag:
            if line.startswith('PR Number: '):
                prNum = line[len('PR Number: '):]
            elif line.startswith('Login: '):
                login = line[len('Login: '):]
            elif line.startswith('Suggested Label: '):
                label = line[len('Suggested Label: '):]

            elif line.startswith('Files:'):
                multiLineFlag = True

        if all([prNum, login, firstFileFound, label]):
            w.write("%s,%s,%s,%s\n" %(prNum, login, firstFileFound, label))
            prNum, login, firstFileFound, label = None,None,None,None 

3 个答案:

答案 0 :(得分:3)

一般的想法是分隔多行或单行的数据,扫描单个属性。一旦找到所有这些,你就会重新开始下一条记录。

prNum, login, firstFileFound, label = None,None,None,None
multiLineFlag = False
list1 = ["jsfile","csfile"]
inputFile = '' # Provide your input filename here
outputFile = '' # Provide your output filename here
labelFound = False
with open(outputFile, 'w') as w:
    w.write("PR Number, Login, First File Found, Suggested Label\n")
    for line in open(inputFile):
        line = line.strip()
        if multiLineFlag and not(firstFileFound):
            if line.startswith('file_label') and any(fileType in line for  fileType in list1):
                firstFileFound = prevLine
                multiLineFlag = False
            else:
                prevLine = line

        if not multiLineFlag:
            if line.startswith('PR Number:'):
                prNum = line[len('PR Number: '):]
            elif line.startswith('Login:'):
                login = line[len('Login: '):]
            elif line.startswith('Suggested Label:'):
                labelFound = True
                label = line[len('Suggested Label: '):]
                print "label is %s "%label

            elif line.startswith('Files:'):
                multiLineFlag = True

        if all([prNum, login, firstFileFound, labelFound]):
            w.write("%s,%s,%s,%s\n" %(prNum, login, firstFileFound, label))
            prNum, login, firstFileFound, label = None,None,None,None
            labelFound=False

如果有关您的数据的一些假设属实,则以下内容将起作用。

因此,输入文件看起来像:

  

PR编号:123
  登录:dbs
  文件:
  文件/文件夹/媒体/ figure01
  file_label2 = figure01
  文件/文件夹/ jsfile到签
  file_label2 = jsfile
  建议标签:价值1   公关编号:423
  登录:ddo
  文件:
  文件/文件夹/媒体/ figure01
  file_label2 = figure01
  文件/文件夹/ csfile2到签
  file_label2 = csfile
  建议标签:
  公关编号:567
  登录:dba
  文件:
  文件/文件夹/媒体/ figure01
  file_label2 = figure01
  文件/文件夹/ csfile到签
  file_label2 = csfile
  推荐标签:Value2

这将返回:

  

公关号码,登录,找到第一个档案,建议标签
  123,dbs,files / file-folder / jsfile-to-checkin,Value1
  423,DDO,文件/文件夹/ csfile2到签,
  567,dba,files / file-folder / csfile-to-checkin,Value2

可能需要进行调整以覆盖边缘条件。

答案 1 :(得分:1)

你没有提到你的剧本有什么错误。我注意到您发布的代码中有两个可能的错误:

1

for循环内部for data in repo.pull_request(prs.number).files():

if [emlabel for emlabel in list1 if emlabel in file_label2]:
                found = "Value1"

此处file_label2应为字符串,emlabel也是字符串,因此我认为您需要的是' =='这里:

if [emlabel for emlabel in list1 if emlabel == file_label2]:

2

当您尝试附加文件名时:

      str_to_list = [x.split(" ") for x in fname.split(" ")]
      row.append(str_to_list[0])

在这里你可能会得到一个嵌套列表str_to_list=[['your/file/name']]。这是你期望的吗?

您在代码中未解释的另一件事是参数repo。它从何而来?它是从其他脚本中获得的,还是需要解析文本文件才能获得它?

请以更简洁明了的方式解释您的问题,以便人们真正提供帮助。

答案 2 :(得分:-1)

我认为这可以满足您的大部分需求。我做了一些假设,比如pull_request(number).files()与外循环中的pr.files()相同。而且我已经删除了一些我认为没有做任何事情的计算(例如,分解'''''''''''''''''

#!python3
import csv
import os.path

class C:
    @property
    def number(self):
        return '12345'

    def as_dict(self):
        return {'labels':[{'name':'Foo'}],
                'login':'xyzzy',
                }

    @property
    def filename(self):
        return 'path/to/jsfile-to-checkin.js'

    def files(self):
        return [C()]

    def issue(self, num):
        return C()

    def pull_requests(self):
        return [C()]

repo = C()

INFO = 'info.csv'
INFO_LABELS = 'info-with-labels.csv'

SKIP_TAGS = set(["Bug", "Blocked", "Investigate"])

FILENAME_LABELS = {
    'csfile':'Value1',
    'jsfile':'Value1',

    'css':'Value2',
    'html':'Value2',
}

with open(INFO, 'w+', newline='') as info_file, \
        open(INFO_LABELS, 'w') as info_labels_file:

    info = csv.writer(info_file)
    info_labels = csv.writer(info_labels_file, lineterminator='\n')

    headers = 'PR Number|Login|First file found'
    info.writerow(headers.split('|'))

    label_headers = headers + '|Suggested Labels'
    info_labels.writerow(label_headers.split('|'))

    for pr in repo.pull_requests():
        pr_issue = repo.issue(pr.number).as_dict()

        labels = [labels['name'] for labels in pr_issue['labels']]

        if any(tag in SKIP_TAGS for tag in labels):
            continue

        first_file = "No file match"
        use_label = ''

        for pr_file in pr.files():
            filename = pr_file.filename.rsplit('/', 1)[-1]
            basename, ext = os.path.splitext(filename)

            name_parts = basename.split('-')
            if len(name_parts) < 3:
                file_tag = name_parts[0]
            else:
                file_tag = '-'.join(name_parts[0:2])

            for text,label in FILENAME_LABELS.items():
                if text in file_tag:
                    first_file = pr_file.filename
                    use_label = label
                    break

            if use_label:
                break

        row = [pr.number, pr_issue['login'], first_file, use_label]
        info_labels.writerow(row)