在熊猫中找到等效的行?

时间:2016-05-12 13:26:56

标签: python pandas

我有一个如下所示的数据框:

interface IB

我想创建一个新的数据框,其中df = pd.DataFrame([ {'code': '0101010C0AAAAAA', 'chemical': '0101010C0', 'is_generic': True, 'format': 'AAAA'}, {'code': '0101010C0BBAAAA', 'chemical': '0101010C0', 'is_generic': False, 'format': 'AAAA'}, {'code': '0101010F0AAAUAU', 'chemical': '0101010F0', 'is_generic': True, 'format': 'AUAU'}, {'code': '0101010F0BCAAAU', 'chemical': '0101010F0', 'is_generic': False, 'format': 'AAAU'}, {'code': '0101010G0AAABAB', 'chemical': '0101010G0', 'is_generic': False, 'format': 'ABAB'} ]) code chemical is_generic format 0 0101010C0AAAAAA 0101010C0 True AAAA 1 0101010C0BBAAAA 0101010C0 False AAAA 2 0101010F0AAAUAU 0101010F0 True AUAU 3 0101010F0BCAAAU 0101010F0 False AAAU 4 0101010G0AAABAB 0101010G0 False ABAB 为False的每个代码都有一行。然后我想为每个代码添加一个列,是具有相同化学和格式的代码,但is_generic为True:

is_generic

我知道如何为 code generic_equiv 0101010C0BBAAAA 0101010C0AAAAAA 0101010F0BCAAAU 0101010F0AAAUAU 0101010G0AAABAB None 为假的每个代码添加一行数据框:

is_generic

我想我想和df进行条件合并,但是怎么做呢?

2 个答案:

答案 0 :(得分:3)

下面...

df = pd.DataFrame([
  {'code': '0101010C0AAAAAA', 'chemical': '0101010C0', 'is_generic': True, 'format': 'AAAA'},
  {'code': '0101010C0BBAAAA', 'chemical': '0101010C0', 'is_generic': False, 'format': 'AAAA'},
  {'code': '0101010F0AAAUAU', 'chemical': '0101010F0', 'is_generic': True, 'format': 'AUAU'},
  {'code': '0101010F0BCAAAU', 'chemical': '0101010F0', 'is_generic': False, 'format': 'AAAU'},
  {'code': '0101010G0AAABAB', 'chemical': '0101010G0', 'is_generic': False, 'format': 'ABAB'}
])

groups = df.groupby('is_generic')
pd.merge(groups.get_group(False), groups.get_group(True), on='chemical', how='left')

...输出

    chemical           code_x format_x is_generic_x           code_y format_y  \
0  0101010C0  0101010C0BBAAAA     AAAA        False  0101010C0AAAAAA     AAAA   
1  0101010F0  0101010F0BCAAAU     AAAU        False  0101010F0AAAUAU     AUAU   
2  0101010G0  0101010G0AAABAB     ABAB        False              NaN      NaN   

  is_generic_y  
0         True  
1         True  
2          NaN  

根据需要设置/重命名列。

答案 1 :(得分:0)

创建一个只存在false的新数据框,并将2个新数据框分开合并

'==================================================================
' Declarations
'==================================================================

Dim ObjWord As Object ' Word application object

'==================================================================
' Macro
'==================================================================

Public Sub Macro()

Dim row As Integer
row = 9 'first available row

Set ObjWord = CreateObject("word.application")

Worksheets("Sheet 2").Activate

While (Cells(row, 2).Value <> "End of file list")

    Set file = ObjWord.documents.Open(ThisWorkbook.path & ".\" & Cells(row, 1).Hyperlinks(1).Address)


    Set currentRange = file.Range

    currentRange.Find.ClearFormatting
    currentRange.Find.Forward = True
    currentRange.Find.Text = ""
    currentRange.Find.Style = "MyStyle"
    bFind = currentRange.Find.Execute

    Do While bFind 'here is the endless loop
         row = row + 1
         StyleValue= currentRange.Text 'I get stuck with the first value :-(
         Rows(row).EntireRow.Insert
         Cells(row, 2).Value = StyleValue
         bFind = currentRange.Find.Execute
    Loop

    file.Close

    row = row + 1 ' next File
Wend

ObjWord.Quit
End Sub

输出:

df1 = df[df['is_generic'] == True]  
df2 = df[df['is_generic'] == False]  
df3 = pd.merge(df1[['chemical','code']],df2[['chemical','code']],left_on='chemical',right_on='chemical',how='right')
del df3['chemical']
df3.rename(columns={'code_x':'generic_equiv','code_y':'code'},inplace=True)