如何基于“或”或“比较”合并2个数据框?

时间:2019-10-18 21:20:13

标签: python pandas

假设我有

edu_data = [['school', 1, 2], ['college', 3, 4], ['grad-school', 5, 6]] 
edu = pd.DataFrame(edu_data, columns = ['Education', 'StudentID1', 'StudentID2']) 
print(edu)
     Education  StudentID1  StudentID2
0       school         1         2
1      college         3         4
2  grad-school         5         6

然后我还有另一个带有学生证的表:

data = [['tom', 3], ['nick', 5], ['juli', 6], ['jack', 10]] 
df = pd.DataFrame(data, columns = ['Name', 'StudentID']) 
print(df)
   Name  StudentID
0   tom        3
1  nick        5
2  juli        6
3  jack       10

如何获得一个表,将df ['StudentID']与edu [“ StudentID1”]或edu [“ StudentID2”]进行匹配。如果df ['StudentID']等于任一,那么我想将edu [“ Education”]附加到df。

所以我希望我的输出是:

   Name  StudentID  Education
0   tom        3     college
1  nick        5     grad-school
2  juli        6     grad-school
3  jack       10     NaN

4 个答案:

答案 0 :(得分:2)

您可以使用DataFrame.melt

mapper=edu.melt(id_vars='Education',var_name = 'StudentID',value_name='ID').set_index('ID')
df['Education']=df['StudentID'].map(mapper['Education'])
print(df)

   Name  StudentID    Education
0   tom          3      college
1  nick          5  grad-school
2  juli          6  grad-school
3  jack         10          NaN

详细信息:

print(mapper)

         Education   StudentID
ID                         
1           school  StudentID1
3          college  StudentID1
5      grad-school  StudentID1
2           school  StudentID2
4          college  StudentID2
6      grad-school  StudentID2

您还可以使用Series.map + Series.combine_first

eduID1=edu.set_index('StudentID1')
eduID2=edu.set_index('StudentID2')
df['Education']=df['StudentID'].map(eduID1['Education']).combine_first(df['StudentID'].map(eduID2['Education']))
print(df)

   Name  StudentID    Education
0   tom          3      college
1  nick          5  grad-school
2  juli          6  grad-school
3  jack         10          NaN

答案 1 :(得分:2)

使用@extends('layouts.app') @section('content') <div class="row"> <div class="col-sm-8 offset-sm-2"> <h1 class="display-3">Update a Stock</h1> @if ($errors->any()) <div class="alert alert-danger"> <ul> @foreach ($errors->all() as $error) <li>{{ $error }}</li> @endforeach </ul> </div> <br /> @endif <form method="post" action="{{ route('updateStock', $Stock->id) }}"> {{ csrf_field() }} <div class="form-group"> <label for="stock_name">Stock Name:</label> <input type="text" class="form-control" name="stock_name" value={{$Stock->stock_name }} /> </div> <div class="form-group"> <label for="stock_qty">Stock Amount:</label> <input type="number" class="form-control" name="stock_qty" value={{$Stock->stock_qty }} /> </div> <div class="form-group"> <label for="stock_unit">Stock Unit:</label> <select id="stock_unit" name="stock_unit" value={{$Stock->stock_unit}}> <option value="Kg">Kg</option> <option value="Qty">Qty</option> </select> </div> <div class="form-group"> <label for="stock_price_per_kg">Price Per Kg:</label> <input type="number" class="form-control" name="stock_price_per_kg" value={{$Stock->stock_price_per_kg }} /> </div> <div class="form-group"> <label for="stock_weight_per_qty">Weight Per Qty:</label> <input type="number" class="form-control" name="stock_weight_per_qty" value={{$Stock->stock_weight_per_qty }} /> </div> <button type="submit" class="btn btn-primary">Update</button> </form> </div> </div> @endsection

map

s = edu.set_index('Education').stack().reset_index(level=1, drop=True)

df['Education'] = df.StudentID.map(pd.Series(s.index, s.values))

答案 2 :(得分:1)

这是meltmerge的解决方案:

(df.merge(edu.melt(id_vars='Education', 
                  value_name='StudentID'),
         on='StudentID',
         how='left')     
      .drop_duplicates(['Name','StudentID'])    # this is for when both StudentID match, we choose the first
     .drop('variable', axis=1)
)

输出:

   Name  StudentID    Education
0   tom          3      college
1  nick          5  grad-school
2  juli          6  grad-school
3  jack         10          NaN

答案 3 :(得分:1)

使用地图,类似于今天早些时候我的回答

mapper = edu.set_index('StudentID1')['Education'].to_dict()
mapper.update(edu.set_index('StudentID2')['Education'].to_dict())

df['Education'] = df['StudentID'].map(mapper)


    Name    StudentID   Education
0   tom     3           college
1   nick    5           grad-school
2   juli    6           grad-school
3   jack    10          NaN