我有一个如下所示的数据框:
interface IB
我想创建一个新的数据框,其中df = pd.DataFrame([
{'code': '0101010C0AAAAAA', 'chemical': '0101010C0', 'is_generic': True, 'format': 'AAAA'},
{'code': '0101010C0BBAAAA', 'chemical': '0101010C0', 'is_generic': False, 'format': 'AAAA'},
{'code': '0101010F0AAAUAU', 'chemical': '0101010F0', 'is_generic': True, 'format': 'AUAU'},
{'code': '0101010F0BCAAAU', 'chemical': '0101010F0', 'is_generic': False, 'format': 'AAAU'},
{'code': '0101010G0AAABAB', 'chemical': '0101010G0', 'is_generic': False, 'format': 'ABAB'}
])
code chemical is_generic format
0 0101010C0AAAAAA 0101010C0 True AAAA
1 0101010C0BBAAAA 0101010C0 False AAAA
2 0101010F0AAAUAU 0101010F0 True AUAU
3 0101010F0BCAAAU 0101010F0 False AAAU
4 0101010G0AAABAB 0101010G0 False ABAB
为False的每个代码都有一行。然后我想为每个代码添加一个列,是具有相同化学和格式的代码,但is_generic为True:
is_generic
我知道如何为 code generic_equiv
0101010C0BBAAAA 0101010C0AAAAAA
0101010F0BCAAAU 0101010F0AAAUAU
0101010G0AAABAB None
为假的每个代码添加一行数据框:
is_generic
我想我想和df进行条件合并,但是怎么做呢?
答案 0 :(得分:3)
下面...
df = pd.DataFrame([
{'code': '0101010C0AAAAAA', 'chemical': '0101010C0', 'is_generic': True, 'format': 'AAAA'},
{'code': '0101010C0BBAAAA', 'chemical': '0101010C0', 'is_generic': False, 'format': 'AAAA'},
{'code': '0101010F0AAAUAU', 'chemical': '0101010F0', 'is_generic': True, 'format': 'AUAU'},
{'code': '0101010F0BCAAAU', 'chemical': '0101010F0', 'is_generic': False, 'format': 'AAAU'},
{'code': '0101010G0AAABAB', 'chemical': '0101010G0', 'is_generic': False, 'format': 'ABAB'}
])
groups = df.groupby('is_generic')
pd.merge(groups.get_group(False), groups.get_group(True), on='chemical', how='left')
...输出
chemical code_x format_x is_generic_x code_y format_y \
0 0101010C0 0101010C0BBAAAA AAAA False 0101010C0AAAAAA AAAA
1 0101010F0 0101010F0BCAAAU AAAU False 0101010F0AAAUAU AUAU
2 0101010G0 0101010G0AAABAB ABAB False NaN NaN
is_generic_y
0 True
1 True
2 NaN
根据需要设置/重命名列。
答案 1 :(得分:0)
创建一个只存在false的新数据框,并将2个新数据框分开合并
'==================================================================
' Declarations
'==================================================================
Dim ObjWord As Object ' Word application object
'==================================================================
' Macro
'==================================================================
Public Sub Macro()
Dim row As Integer
row = 9 'first available row
Set ObjWord = CreateObject("word.application")
Worksheets("Sheet 2").Activate
While (Cells(row, 2).Value <> "End of file list")
Set file = ObjWord.documents.Open(ThisWorkbook.path & ".\" & Cells(row, 1).Hyperlinks(1).Address)
Set currentRange = file.Range
currentRange.Find.ClearFormatting
currentRange.Find.Forward = True
currentRange.Find.Text = ""
currentRange.Find.Style = "MyStyle"
bFind = currentRange.Find.Execute
Do While bFind 'here is the endless loop
row = row + 1
StyleValue= currentRange.Text 'I get stuck with the first value :-(
Rows(row).EntireRow.Insert
Cells(row, 2).Value = StyleValue
bFind = currentRange.Find.Execute
Loop
file.Close
row = row + 1 ' next File
Wend
ObjWord.Quit
End Sub
输出:
df1 = df[df['is_generic'] == True]
df2 = df[df['is_generic'] == False]
df3 = pd.merge(df1[['chemical','code']],df2[['chemical','code']],left_on='chemical',right_on='chemical',how='right')
del df3['chemical']
df3.rename(columns={'code_x':'generic_equiv','code_y':'code'},inplace=True)