我有两个csv文件。
文件1包含文件路径,组,权限如下
/path/eds/aws/file1.dat,dp_card,640
/path/eds/aws/file2.dat,dp_card,600
/path/edh/vs/file1.dat,dp_card,640
/pth/edw/de/file1.dat,pdp_card,640
/pth/edn/de/file1.dat,pdp_card,640
文件2,包含目录路径,组,批处理所有者,如下所示
/path/eds/aws/,dp_card,dp_batchown
/path/edh/vs/,dp_card,dp_batchown
/path/edw/de/,pdp_card,dp_batchown
我想根据file2中的路径比较两个文件。如果路径存在于文件1中,我想将文件名,组,权限,路径,组写入另一个文件。
示例输出:
档案3
/path/eds/aws/file1.dat,dp_card,640,/path/eds/aws/,dp_card
/path/eds/aws/file2.dat,dp_card,600,/path/eds/aws/,dp_card
/path/edh/vs/file1.dat,dp_card,640,/path/edh/vs/,dp_card
/pth/edw/de/file1.dat,pdp_card,640,/path/edw/de/,dp_card
有人可以帮我编写上面的代码。我是从昨天开始尝试的。
我到目前为止编写的代码。
#!/usr/bin/python
import csv
import os.path
csv_dialect = dict(delimiter=',', quotechar='|')
path = set()
with open('hdfs','rb') as file_a :
reader1 = csv.reader(file_a, **csv_dialect)
next(reader1)
for row in reader1:
dirpath = os.path.dirname(row[0])
#absp = abspath[:-1]
path.add(dirpath)
#print(abspath)
with open('file2', 'ab') as file_c:
writer = csv.writer(file_c, **csv_dialect)
with open('lake.csv', 'rb') as file_b:
reader2 = csv.reader(file_b, **csv_dialect)
next(reader2)
for row in reader2:
dirpath1 = os.path.dirname(row[0])
#print(dirpath1)
if (dirpath1) in path:
writer.writerow(row)
#print(row)
答案 0 :(得分:0)
Pandas使这非常简单且具有高度可读性
import pandas as pd
import os.path
df1 = pd.read_csv('file1.txt', header=None, names=['fpath', 'group', 'permission'])
df2 = pd.read_csv('file2.txt', header=None, names=['dpath', 'group', 'owner'])
df1['dpath'] = df1['fpath'].apply(os.path.dirname)
df2['dpath'] = df2['dpath'].apply(os.path.dirname)
df3 = pd.merge(df1, df2, on="dpath", how="inner")
df3[['fpath', 'group_x', 'permission', 'dpath', 'group_y']].to_csv(
'file3.txt', index=False, header=False)
输出file3.txt:
/path/eds/aws/file1.dat,dp_card,640,/path/eds/aws,dp_card
/path/eds/aws/file2.dat,dp_card,600,/path/eds/aws,dp_card
/path/edh/vs/file1.dat,dp_card,640,/path/edh/vs,dp_card