Question

我发现与此相关的其他帖子非常多，但它们没有帮助。

我有一个CSV主文件，我需要从第三列中找到特定的“字符串”。如下所示：

Name,ID,Title,Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved
Joshua Morales,MF6B9X,Tech_Rep, 08-Nov-2016,948,740,8,8
Betty García,ERTW77,SME, 08-Nov-2016,965,854,15,12
Kathleen Marrero,KTD684,Probation, 08-Nov-2016,946,948,na,na
Mark León,GSL89D,Tech_Rep, 08-Nov-2016,951,844,6,4

ID列是唯一的，因此我尝试查找“ KTD684”（例如）。找到后，我需要导出“日期”，“ Prj1_Assigned”，“ Prj1_closed”，“ Prj2_assigned”和“ Prj2_solved”的值。

导出将导出到文件'KTD684.csv'（与ID相同），其中已经有标头'Date，Prj1_Assigned，Prj1_closed，Prj2_assigned，Prj2_solved'

到目前为止（由于我不是程序员）我还无法起草这份报告，但是请您能指导我：

使用元素“ KTD684”查找行。
从该行中选择以下值： ['日期，Prj1_Assigned，Prj1_closed，Prj2_assigned，Prj2_solved']
请在文件名后面附加ID本身（'KTD684.csv'）

我需要对45个用户ID执行此操作，现在在公司中雇用了195个。我试图编写excel宏（也无法工作），但是我觉得python最可靠。

我知道我至少需要展示基本的进展，但是经过2个月的尝试向某人学习后，我仍然找不到此csv中的元素。

Answer 1

如果我正确理解您的问题；您需要读取2个输入文件：

1包含您要查找的用户ID
2包含与用户相关的项目数据

以这种方式，可以在文件2中找到您在1中指定的所有用户，并将其写到result.csv

在search_for.csv中指定您的搜索ID。请记住，每次运行它都会重写result.csv。

# Variables for Watcher
$folder = "C:\Program Files\Whatever\Connector\Export\JobStatus"
$filter = '*.txt'
$date=(get-date -Format d) -replace("/")
$time=(get-date -Format t) -replace(":")

# Watcher + Settings                     
$fsw = New-Object IO.FileSystemWatcher $folder, $filter
$fsw.IncludeSubdirectories = $false
$fsw.NotifyFilter = [IO.NotifyFilters]'FileName', 'DirectoryName'

# Register Event (when file is created)
$onCreated = Register-ObjectEvent $fsw Created -SourceIdentifier FileCreated 
-Action {

# Foreach file loop 
ForEach ($f in $fsw)
{
    if (($File = Get-Item $Event.SourceEventArgs.FullPath | select -Expand 
Extension) -eq ".txt")
    {
        #Used for file testing - Opens the text file for 10 secs, then kills 
it.
        #Start-Process -FilePath $Event.SourceEventArgs.FullPath  | %{ sleep 
10; $_ } | kill


        # Variables for move   
        $folderpath = ($Event.SourceEventArgs.FullPath | Split-Path)
        $folderfile = ($Event.SourceEventArgs.FullPath | Split-Path -Leaf)
        $destination = "C:\Program Files\Whatever\Connector\Staging\"
        $newname = "job.import.$date"+"_"+"$time.txt"


    }

    # Variables for logging
    $logpath = 'C:\Program 
Files\Whatever\Connector\Export\JobStatus\outlog.txt'

    # Grab current file and move to "Staging" folder
    try
    {



        Get-ChildItem -Path $folderpath -Filter $folderfile | Move-Item - 
Destination $destination | sleep 5 | Write-Host Rename-Item 
$destination$folderfile -NewName $newname | Out-File -FilePath $logpath - 
Append
        Write-Host $destination$newname
        #sleep 5

        #Rename-Item "$destination $folderfile" -NewName $newname
        #Write-Host $destination $folderfile
        #"File $folderfile renamed to $newname" | Out-File -FilePath 
$logpath -Append

        # Log the move in logfile   
        "File $folderfile moved to $destination" | Out-File -FilePath 
$logpath -Append






    }

    # Log if errors + clear
    catch
    {
        $error | Out-File -FilePath $logpath -Append
        $Error.Clear()
    }
}
}

例如， search_for.csv 看起来像这样

Answer 2

这是pandas的理想用例：

import pandas as pd

id_list = ['KTD684']

df = pd.read_csv('input.csv')
# Only keep values that are in 'id_list'
df = df[df['ID'].isin(id_list)]

gb = df.groupby('ID')
for name, group in gb:
    with open('{}.csv'.format(name), 'a') as f:
        group.to_csv(f, header=False, index=False,
                     columns=["Date", "Prj1_Assigned", "Prj1_closed",
                             "Prj2_assigned", "Prj2_solved"])

这将打开CSV，仅选择列表（id_list）中的行，按ID列中的值进行分组，并为每个唯一的ID保存单个CSV文件。您只需要扩展id_list即可获得您感兴趣的ID。

扩展示例：

读取CSV会得到一个DataFrame对象，如下所示：

df = pd.read_csv('input.csv')
               Name      ID      Title          Date  Prj1_Assigned  \
0    Joshua Morales  MF6B9X   Tech_Rep   08-Nov-2016            948
1      Betty García  ERTW77        SME   08-Nov-2016            965
2  Kathleen Marrero  KTD684  Probation   08-Nov-2016            946
3         Mark León  GSL89D   Tech_Rep   08-Nov-2016            951

   Prj1_closed Prj2_assigned Prj2_solved
0          740             8           8
1          854            15          12
2          948            na          na
3          844             6           4

如果您仅选择KTD684和GSL89D：

id_list = ['KTD684', 'GSL89D']
df = df[df['ID'].isin(id_list)]
               Name      ID      Title          Date  Prj1_Assigned  \
2  Kathleen Marrero  KTD684  Probation   08-Nov-2016            946
3         Mark León  GSL89D   Tech_Rep   08-Nov-2016            951

   Prj1_closed Prj2_assigned Prj2_solved
2          948            na          na
3          844             6           4

groupby上的ID操作组并将每个唯一ID导出到CSV文件，结果是：

KTD684.csv
Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved
08-Nov-2016,946,948,na,na

GSL89D.csv
Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved
08-Nov-2016,951,844,6,4

Answer 3

这是一种纯python方法，它使用csv.DictReader读取主.csv文件，匹配ID，然后使用{{3}将文件数据追加到新的或现有的.csv文件中}：

from csv import DictReader
from csv import DictWriter

from os.path import isfile

def export_csv(user_id, master_csv, fieldnames, key_id, extension=".csv"):
    filename = user_id + extension
    file_exists = isfile(filename)

    with open(file=master_csv) as in_file, open(
        file=filename, mode="a", newline=""
    ) as out_file:

        # Create reading and writing objects
        csv_reader = DictReader(in_file)
        csv_writer = DictWriter(out_file, fieldnames=fieldnames)

        # Only write header once
        if not file_exists:
            csv_writer.writeheader()

        # Go through lines and match ids
        for line in csv_reader:
            if line[key_id] == user_id:

                 # Modify line and append to file
                line = {k: v.strip() for k, v in line.items() if k in fieldnames}
                csv_writer.writerow(line)

可以这样称呼：

export_csv(
    user_id="KTD684",
    master_csv="master.csv",
    fieldnames=["Date", "Prj1_Assigned", "Prj1_closed", "Prj2_assigned", "Prj2_solved"],
    key_id="ID",
)

并生成以下 KTD684.csv

：

Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved
08-Nov-2016,946,948,na,na

Python / Numpy（CSV）：查找值，附加另一个csv

3 个答案:

扩展示例：