我试图比较两个优点,一个是用户矩阵,另一个是我从主机生成的。我想知道矩阵中的用户设置是否正确。
我从主机获得的结果,我导入了pandas:这里的用户组是列名!
Name Users Domain Admins Administrators Schema Admins
0 xxx NaN Yes Yes NaN
问题是:
excel矩阵就像
user: groups
xxx administrators
schema admins
domain admins
这是我尝试过的:
我将所有“是”替换为列名:
for i in df.columns:
df[i].replace('Yes',i,inplace=True)
从中删除空值。
group=df.dropna(axis='columns',how='all')
现在是这样的:
Name Users Domain Admins Administrators Schema Admins
0 xxx Domain admins Administrators Schema Admins
另一个像:
User Account Name Group
0 xxx Domain Admins, Local admin,Administrators
我不知道下一步该怎么做。请指导我如何在循环中比较所有索引的索引值。
原始的两个excel是这样的:
user: groups
xxx administrators
schema admins
domain admins
yyy administrators
domain admins
zzz administrators
schema admins
其他文件,例如:
username administrators schema admins domain admins
xxx yes yes NaN
yyy yes NaN yes
答案 0 :(得分:0)
这是可以做到的:
步骤1:转换主机df
cols = ['administrators', 'schema admins', 'domain admins']
df1['merged'] = df1[cols].apply(lambda x: ', '.join(x[x.notnull()]), axis = 1) ##df1 is host df
第2步:转换矩阵df
df.user = df.user.ffill() ## Fill the empty rows with same user name
grouped_df = df.groupby("user")['groups'].apply(','.join).reset_index() ## merge same user name to 1 row
第3步:比较df
result_df = pd.merge(df1, grouped_df, how='inner', left_on="merged", right_on="user") ## The left_on/right_on will change according to the column name you have
答案 1 :(得分:0)
您可以将数据添加到字典中以使事情变得容易。如果以下是数据文件:
user: groups
xxx administrators
schema admins
domain admins
user: groups
yyy administrators
domain admins
user: groups
zzz administrators
schema admins
以下代码将创建一个字典:
with open('userdata.txt', 'r') as f:
# read data file and split into lines; also trim lines;
datalist = list(map(lambda x: x.strip(), f.readlines()))
userdict = {} # dictionary to collect data;
username=""; grplist = []; newuser = True # variable to read data from file:
for line in datalist:
if line.startswith('user:'):
if not(username=="" and len(grplist)==0): # omit at first run
userdict[username] = grplist # put user data into dictionary
username=""; grplist=[]; newuser=True # clear variable for new user;
elif newuser:
username, grpname = list(map(lambda x: x.strip(), line.split()))
grplist.append(grpname) # append group name to temporary list
newuser = False
else:
grplist.append(line) # append more groups;
userdict[username] = grplist
print(userdict)
输出:
{'yyy': ['administrators', 'domain admins'], 'zzz': ['administrators', 'schema admins'], 'xxx': ['administrators', 'schema admins', 'domain admins']}
如果第二个文件中的数据如下:
Account Name Group
xxx administrators , schema admins, domain admins
yyy administrators , domain admins
zzz administrators , schema admins
以下代码将从中获取字典:
with open('userdata2.txt', 'r') as f:
# read data file and split into lines; also trim lines;
datalines = list(map(lambda x: x.strip(), f.readlines()))
userdict2={}
for line in datalines[1:]: # omit first line which is only header
infolist = list(map(lambda x: x.strip(), line.split(" ",1)))
username = infolist[0].strip()
grplist = list(map(lambda x: x.strip(), infolist[1].split(",")))
userdict2[username] = grplist
print(userdict2)
输出:
{'zzz': ['administrators', 'schema admins'], 'xxx': ['administrators', 'schema admins', 'domain admins'], 'yyy': ['administrators', 'domain admins']}
要比较2个字典,只需使用==
:
print(userdict == userdict2)
输出:
True
要比较特定用户的组:
print(userdict['xxx'] == userdict1['xxx'])
输出:
True
答案 2 :(得分:0)
我会让从宿主导入的熊猫(我们称其为df_host
)保持不变,并为从 matrix 导入的熊猫(称为{{1} }):
df_matrix
接下来,我将在两个数据框中将用户名用作索引:
groups = ['Users', 'Domain Admins', 'Administrators', 'Schema Admins']
for g in groups:
df_matrix[g] = df_matrix.Group.str.contains(g)
您现在可以轻松地加入数据框:
df_matrix.set_index('Account Name', inplace=True)
df_host.set_index('Name', inplace=True)
最后,您应该有一个数据帧,每个用户一行,并且从主机和excel矩阵中看到一组用于分组的列,这应该使比较容易。