我发现与此相关的其他帖子非常多,但它们没有帮助。
我有一个CSV主文件,我需要从第三列中找到特定的“字符串”。如下所示:
Name,ID,Title,Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved
Joshua Morales,MF6B9X,Tech_Rep, 08-Nov-2016,948,740,8,8
Betty García,ERTW77,SME, 08-Nov-2016,965,854,15,12
Kathleen Marrero,KTD684,Probation, 08-Nov-2016,946,948,na,na
Mark León,GSL89D,Tech_Rep, 08-Nov-2016,951,844,6,4
ID列是唯一的,因此我尝试查找“ KTD684”(例如)。找到后,我需要导出“日期”,“ Prj1_Assigned”,“ Prj1_closed”,“ Prj2_assigned”和“ Prj2_solved”的值。
导出将导出到文件'KTD684.csv'(与ID相同),其中已经有标头'Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved'
到目前为止(由于我不是程序员)我还无法起草这份报告,但是请您能指导我:
我需要对45个用户ID执行此操作,现在在公司中雇用了195个。我试图编写excel宏(也无法工作),但是我觉得python最可靠。
我知道我至少需要展示基本的进展,但是经过2个月的尝试向某人学习后,我仍然找不到此csv中的元素。
答案 0 :(得分:1)
如果我正确理解您的问题;您需要读取2个输入文件:
1包含您要查找的用户ID
2包含与用户相关的项目数据
以这种方式,可以在文件2中找到您在1中指定的所有用户,并将其写到result.csv
在search_for.csv中指定您的搜索ID。请记住, 每次运行它都会重写result.csv。
# Variables for Watcher
$folder = "C:\Program Files\Whatever\Connector\Export\JobStatus"
$filter = '*.txt'
$date=(get-date -Format d) -replace("/")
$time=(get-date -Format t) -replace(":")
# Watcher + Settings
$fsw = New-Object IO.FileSystemWatcher $folder, $filter
$fsw.IncludeSubdirectories = $false
$fsw.NotifyFilter = [IO.NotifyFilters]'FileName', 'DirectoryName'
# Register Event (when file is created)
$onCreated = Register-ObjectEvent $fsw Created -SourceIdentifier FileCreated
-Action {
# Foreach file loop
ForEach ($f in $fsw)
{
if (($File = Get-Item $Event.SourceEventArgs.FullPath | select -Expand
Extension) -eq ".txt")
{
#Used for file testing - Opens the text file for 10 secs, then kills
it.
#Start-Process -FilePath $Event.SourceEventArgs.FullPath | %{ sleep
10; $_ } | kill
# Variables for move
$folderpath = ($Event.SourceEventArgs.FullPath | Split-Path)
$folderfile = ($Event.SourceEventArgs.FullPath | Split-Path -Leaf)
$destination = "C:\Program Files\Whatever\Connector\Staging\"
$newname = "job.import.$date"+"_"+"$time.txt"
}
# Variables for logging
$logpath = 'C:\Program
Files\Whatever\Connector\Export\JobStatus\outlog.txt'
# Grab current file and move to "Staging" folder
try
{
Get-ChildItem -Path $folderpath -Filter $folderfile | Move-Item -
Destination $destination | sleep 5 | Write-Host Rename-Item
$destination$folderfile -NewName $newname | Out-File -FilePath $logpath -
Append
Write-Host $destination$newname
#sleep 5
#Rename-Item "$destination $folderfile" -NewName $newname
#Write-Host $destination $folderfile
#"File $folderfile renamed to $newname" | Out-File -FilePath
$logpath -Append
# Log the move in logfile
"File $folderfile moved to $destination" | Out-File -FilePath
$logpath -Append
}
# Log if errors + clear
catch
{
$error | Out-File -FilePath $logpath -Append
$Error.Clear()
}
}
}
例如, search_for.csv 看起来像这样
答案 1 :(得分:0)
这是pandas
的理想用例:
import pandas as pd
id_list = ['KTD684']
df = pd.read_csv('input.csv')
# Only keep values that are in 'id_list'
df = df[df['ID'].isin(id_list)]
gb = df.groupby('ID')
for name, group in gb:
with open('{}.csv'.format(name), 'a') as f:
group.to_csv(f, header=False, index=False,
columns=["Date", "Prj1_Assigned", "Prj1_closed",
"Prj2_assigned", "Prj2_solved"])
这将打开CSV,仅选择列表(id_list
)中的行,按ID
列中的值进行分组,并为每个唯一的ID
保存单个CSV文件。您只需要扩展id_list
即可获得您感兴趣的ID。
读取CSV会得到一个DataFrame对象,如下所示:
df = pd.read_csv('input.csv')
Name ID Title Date Prj1_Assigned \
0 Joshua Morales MF6B9X Tech_Rep 08-Nov-2016 948
1 Betty García ERTW77 SME 08-Nov-2016 965
2 Kathleen Marrero KTD684 Probation 08-Nov-2016 946
3 Mark León GSL89D Tech_Rep 08-Nov-2016 951
Prj1_closed Prj2_assigned Prj2_solved
0 740 8 8
1 854 15 12
2 948 na na
3 844 6 4
如果您仅选择KTD684
和GSL89D
:
id_list = ['KTD684', 'GSL89D']
df = df[df['ID'].isin(id_list)]
Name ID Title Date Prj1_Assigned \
2 Kathleen Marrero KTD684 Probation 08-Nov-2016 946
3 Mark León GSL89D Tech_Rep 08-Nov-2016 951
Prj1_closed Prj2_assigned Prj2_solved
2 948 na na
3 844 6 4
groupby
上的ID
操作组并将每个唯一ID导出到CSV文件,结果是:
KTD684.csv
Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved
08-Nov-2016,946,948,na,na
GSL89D.csv
Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved
08-Nov-2016,951,844,6,4
答案 2 :(得分:0)
这是一种纯python方法,它使用csv.DictReader
读取主.csv
文件,匹配ID,然后使用{{3}将文件数据追加到新的或现有的.csv
文件中}:
from csv import DictReader
from csv import DictWriter
from os.path import isfile
def export_csv(user_id, master_csv, fieldnames, key_id, extension=".csv"):
filename = user_id + extension
file_exists = isfile(filename)
with open(file=master_csv) as in_file, open(
file=filename, mode="a", newline=""
) as out_file:
# Create reading and writing objects
csv_reader = DictReader(in_file)
csv_writer = DictWriter(out_file, fieldnames=fieldnames)
# Only write header once
if not file_exists:
csv_writer.writeheader()
# Go through lines and match ids
for line in csv_reader:
if line[key_id] == user_id:
# Modify line and append to file
line = {k: v.strip() for k, v in line.items() if k in fieldnames}
csv_writer.writerow(line)
可以这样称呼:
export_csv(
user_id="KTD684",
master_csv="master.csv",
fieldnames=["Date", "Prj1_Assigned", "Prj1_closed", "Prj2_assigned", "Prj2_solved"],
key_id="ID",
)
并生成以下 KTD684.csv
:Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved
08-Nov-2016,946,948,na,na