我有两个DataFrame,它们具有相同的列名,包含一些匹配的数据和一些唯一的数据。
我想排除中间内容,只保存两个DataFrames的唯一内容。
我如何连接或合并或加入这两个数据帧呢?
例如在这张图片中我不想要这张图片的中间部分,我想要双方而不是中间:
现在我的代码:
//Displays the total bonus for an employee
BonusOnSavings bonus = new BonusOnSavings();
System.out.printf("The total bonus for this employee is $%,.2f",
bonus.computeBonus(commitment, q1, q2, q3, q4) + "\n");
让我举一个例子说明我需要的东西:
def query_to_df(query):
...
df_a = pd.DataFrame(data_a)
df_b = pd.DataFrame(data_b)
outer_results = pd.concat([df_a, df_b], axis=1, join='outer')
return df
或者我对2个数据帧感到满意
df_a =
col_a col_b col_c
a1 b1 c1
a2 b2 c2
df_b =
col_a col_b col_c
a2 b2 c2
a3 b3 c3
# they only share the 2nd row: a2 b2 c2
# so the outer result should be:
col_a col_b col_c col_a col_b col_c
a1 b1 c1 NA NA NA
NA NA NA a3 b3 c3
最后,您会注意到result_1 =
col_a col_b col_c
a1 b1 c1
result_2 =
col_a col_b col_c
a3 b3 c3
被排除,因为所有列都匹配 - 如何根据所有列指定我想要加入,而不只是1?如果a2 b2 c2
有df_a
我也希望该行也在a2 foo c2
。
答案 0 :(得分:4)
将merge
与indicator
参数一起使用,outer
先加入,然后按query
或boolean indexing
进行过滤:
df = df_a.merge(df_b, how='outer', indicator=True)
print (df)
col_a col_b col_c _merge
0 a1 b1 c1 left_only
1 a2 b2 c2 both
2 a3 b3 c3 right_only
a = df.query('_merge == "left_only"').drop('_merge', 1)
print (a)
col_a col_b col_c
0 a1 b1 c1
b = df.query('_merge == "right_only"').drop('_merge', 1)
print (b)
col_a col_b col_c
2 a3 b3 c3
或者:
a = df[df['_merge'] == "left_only"].drop('_merge', 1)
print (a)
col_a col_b col_c
0 a1 b1 c1
b = df[df['_merge'] == "right_only"].drop('_merge', 1)
print (b)
col_a col_b col_c
2 a3 b3 c3
答案 1 :(得分:4)
使用pd.DataFrame.drop_duplicates
这假设行在各自的数据帧中是唯一的。
df_a.append(df_b).drop_duplicates(keep=False)
col_a col_b col_c
0 a1 b1 c1
1 a3 b3 c3
您甚至可以使用pd.concat
keys
参数来提供行所在的上下文。
pd.concat([df_a, df_b], keys=['a', 'b']).drop_duplicates(keep=False)
col_a col_b col_c
a 0 a1 b1 c1
b 1 a3 b3 c3
答案 2 :(得分:1)
concat和drop_duplicates with keep = False
{% extends 'admin/model/list.html' %}
{% block model_list_table %}
<div class="table-responsive">
<table class="table table-striped table-bordered table-hover model-list">
<thead>
<tr>
{% block list_header scoped %}
{% if actions %}
<th class="list-checkbox-column">
<input type="checkbox" name="rowtoggle" class="action-rowtoggle" title="{{ _gettext('Select all records') }}" />
</th>
{% endif %}
{% block list_row_actions_header %}
{% if admin_view.column_display_actions %}
<th class="col-md-1"> </th>
{% endif %}
{% endblock %}
{% for c, name in list_columns %}
{% set column = loop.index0 %}
<th class="column-header col-{{c}}">
{% if admin_view.is_sortable(c) %}
{% if sort_column == column %}
<a href="{{ sort_url(column, True) }}" title="{{ _gettext('Sort by %(name)s', name=name) }}">
{{ name }}
{% if sort_desc %}
<span class="fa fa-chevron-up glyphicon glyphicon-chevron-up"></span>
{% else %}
<span class="fa fa-chevron-down glyphicon glyphicon-chevron-down"></span>
{% endif %}
</a>
{% else %}
<a href="{{ sort_url(column) }}" title="{{ _gettext('Sort by %(name)s', name=name) }}">{{ name }}</a>
{% endif %}
{% else %}
{{ name }}
{% endif %}
{% if admin_view.column_descriptions.get(c) %}
<a class="fa fa-question-circle glyphicon glyphicon-question-sign"
title="{{ admin_view.column_descriptions[c] }}"
href="javascript:void(0)" data-role="tooltip"
></a>
{% endif %}
</th>
{% endfor %}
{% endblock %}
</tr>
</thead>
{% for row in data %}
<tr>
{% block list_row scoped %}
{% if actions %}
<td>
<input type="checkbox" name="rowid" class="action-checkbox" value="{{ get_pk_value(row) }}" title="{{ _gettext('Select record') }}" />
</td>
{% endif %}
{% block list_row_actions_column scoped %}
{% if admin_view.column_display_actions %}
<td class="list-buttons-column">
{% block list_row_actions scoped %}
{% for action in list_row_actions %}
{{ action.render_ctx(get_pk_value(row), row) }}
{% endfor %}
{% endblock %}
</td>
{%- endif -%}
{% endblock %}
{% for c, name in list_columns %}
<td class="col-{{c}}">
{% if admin_view.is_editable(c) %}
{% set form = list_forms[get_pk_value(row)] %}
{% if form.csrf_token %}
{{ form[c](pk=get_pk_value(row), display_value=get_value(row, c), csrf=form.csrf_token._value()) }}
{% else %}
{{ form[c](pk=get_pk_value(row), display_value=get_value(row, c)) }}
{% endif %}
{% else %}
{{ get_value(row, c) }}
{% endif %}
</td>
{% endfor %}
{% endblock %}
</tr>
{% else %}
<tr>
<td colspan="999">
{% block empty_list_message %}
<div class="text-center">
{{ admin_view.get_empty_list_message() }}
</div>
{% endblock %}
</td>
</tr>
{% endfor %}
</table>
</div>
<h3>Summaries</h3>
<div class="table-responsive">
<table class="table table-striped table-bordered table-hover model-list">
<thead>
<tr>
{% if actions %}
<th class="list-checkbox-column">
</th>
{% endif %}
<th class="col-md-1"></th>
{% for c, name in list_columns %}
{% set column = loop.index0 %}
<th class="column-header col-{{c}}">
{{ name }}
</th>
{% endfor %}
</tr>
</thead>
{% for row in summary_data %}
<tr>
<td colspan="2"><strong>{{ row['title'] or ''}}</strong></td>
{% for c, name in list_columns %}
<td class="col-{{c}}">
{{ row[c] or ''}}
</td>
{% endfor %}
</tr>
{% endfor %}
</table>
</div>
{% block list_pager %}
{% if num_pages is not none %}
{{ lib.pager(page, num_pages, pager_url) }}
{% else %}
{{ lib.simple_pager(page, data|length == page_size, pager_url) }}
{% endif %}
{% endblock %}
{% endblock %}
使用numpy setdiff1
new_df = pd.concat([df_a, df_b]).drop_duplicates(keep=False)
col_a col_b col_c
0 a1 b1 c1
1 a3 b3 c3
DF_A
df_a = pd.DataFrame(np.setdiff1d(np.array(df_a.values), np.array(df_b.values))\
.reshape(-1, df_a.shape[1]), columns = df_a.columns)
df_b = pd.DataFrame(np.setdiff1d(np.array(df_b.values), np.array(df_a.values))\
.reshape(-1, df_b.shape[1]), columns = df_b.columns)
DF_B
col_a col_b col_c
0 a1 b1 c1