Question

我在A列中有一个仅包含“雇员ID”的Excel工作表，如下所示。

我还有一张Excel工作表，其中包含10000+名员工的“员工详细信息”。例如：员工详细信息excel工作表包含包含很多员工的数据，这是下面显示的员工ID的示例。

Empid   Name    Location    JobTitle    Email-id     Department
1677    Umesh     Gadag      ASE      abc@gmail.com    Civil

这是工作代码

import pandas as pd
df1 = pd.read_excel (r'C:\\Users\\Kiran\\Desktop\\Employee id.xlsx',header=None)# excel sheet containing only ids
df2= pd.read_excel (r'C:\\Users\\Kiran\\Desktop\\Employee details.xlsx) # excel sheet containing all details of 10000+ employees
df3 = df2[df2['Empid'].isin(df1[0])]
df3.to_excel("Output1.xlsx",index=False)#Final output

代码工作正常，但我会以随机方式输出

Empid   Name    Location    JobTitle    Email-id       Department
1677    Umesh     Gadag      ASE      abc@gmail.com      Civil
5623    Kiran     Hubli      SE       123@gmail.com      Civil
5618    Rudra     Bidar      ASE      xyz@gmail.com     Electrical
5597    Suresh    Udupi      ASE       ppp@gmail.com    Mechanical

但是我需要按以下顺序输出，因为Employee ID按特定顺序排列

Empid   Name    Location    JobTitle    Email-id      Department
1677    Umesh     Gadag      ASE      abc@gmail.com     Civil
5597    Suresh    Udupi      ASE      ppp@gmail.com     Mechanical 
5623    Kiran     Hubli      SE       123@gmail.com     Civil
5618    Rudra     Bidar      ASE      xyz@gmail.com     Electrical

Answer 1

假设df_small是具有员工ID条目的数据框，该条目的数据需要从拥有超过10000名员工数据的df_big中获取。

因此可以按以下方式获取详细信息：

df_emp_details = df_big[df_big['Empid'].isin(df_small['Employee id'])]

编辑：要读取没有标题/列名称的excel，请使用：

# This will create a default column 0 in the dataframe.
df_small = pd.read_excel('path/to/excel.xlsx', header=None)

# Use below code to fetch the details.
df_emp_details = df_big[df_big['Empid'].isin(df_small[0])]

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

EDIT2：我相信您希望提取的行按员工ID的顺序排列。为此，请使用sort_values

# ...
# Sorts based on column `Empid`.
df_emp_details = df_emp_details.sort_values(by='Empid')

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

Answer 2

from pandas import read_excel

excel_data_df = read_excel('data.xlsx', sheet_name='Sheet1')
excel_data_df.columns = ["Empid", "Name", "Location", "JobTitle", "Email-i", "Department"]


emp_id = int(input("Enter Employee id: "))
for columns in excel_data_df[excel_data_df.Empid == emp_id].values:
    for item in columns:
        print(item)

Answer 3

您想要左加入

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html

当它连接到索引上时，您需要确保Empid列已设置为索引

df_small = df_small.join(df_big.set_index('Empid'), on = 'Employee ID', how = 'left')

希望该方法在将来得到改进，因此它使设置要联接的列或不进入复杂的多索引的多个列联接变得更加容易。

从Excel工作表中获取员工详细信息

3 个答案: