我有两个DataFrame。 df
提供了大量数据。 test_df
描述了某些测试是否已通过。我需要通过在df
中查找此信息,从test_df
中仅选择测试未失败的行。到目前为止,我可以将test_df
缩减为passed_tests
。那么,剩下的就是只选择df
中行索引的相关部分位于passed_tests
的行。我怎么能这样做?
更新
我的代码:
import pandas as pd
import numpy as np
index = [np.array(['foo', 'foo', 'foo', 'foo', 'qux', 'qux', 'qux']), np.array(['a', 'a', 'b', 'b', 'a', 'b', 'b'])]
data = np.array(['False', 'True', 'False', 'False', 'False', 'Ok', 'False'])
columns = ["Passed?"]
test_df = pd.DataFrame(data, index=index, columns=columns)
print test_df
index = [np.array(['foo', 'foo', 'foo', 'foo', 'qux', 'qux', 'qux', 'qux']),
np.array(['a', 'a', 'b', 'b', 'a', 'a', 'b', 'b']),
np.array(['1', '2', '1', '2', '1', '2', '1', '2'])]
data = np.random.randn(8, 2)
columns = ["X", "Y"]
df = pd.DataFrame(data, index=index, columns=columns)
print df
passed_tests = test_df.loc[test_df['Passed?'].isin(['True', 'Ok'])]
print passed_tests
DF
X Y
foo a 1 0.589776 -0.234717
2 0.105161 1.937174
b 1 -0.092252 0.143451
2 0.939052 -0.239052
qux a 1 0.757239 2.836032
2 -0.445335 1.352374
b 1 2.175553 -0.700816
2 1.082709 -0.923095
test_df
Passed?
foo a False
a True
b False
b False
qux a False
b Ok
b False
passed_tests
Passed?
foo a True
qux b Ok
必需的解决方案
X Y
foo a 1 0.589776 -0.234717
2 0.105161 1.937174
qux b 1 2.175553 -0.700816
2 1.082709 -0.923095
答案 0 :(得分:1)
<?php
if(isset($_POST['create_post']))
{
$post_title = $_POST['title'];
$post_author = $_POST['post_author'];
$post_category_id = $_POST['post_category_id'];
$post_status = $_POST['post_status'];
$post_image = $_FILES['image']['name'];
$post_image_temp = $_FILES['image']['tmp_name'];
$post_tags = $_POST['post_tags'];
$post_content = $_POST['post_content'];
$post_date = date('d-m-y');
$post_comment_count = 4;
move_uploaded_file($post_image_temp, "../image/ $post_image");
$query = "INSERT INTO posts(post_category_id, post_title, post_author, post_date, post_image, post_content, post_tags, post_comment_count, post_status) ";
$query .= "Values ( $post_category_id, '$post_title', '$post_author',now(), '$post_image', '$post_content', '$post_tags', '$post_comment_count', '$post_status') ";
$connet_query_post = mysqli_query($connection, $query);
if(!$connet_query_post)
{
die("Query Failed" . mysqli_error($connection));
}
}
?>
<h1 class="page-header">
Wellcome to Admin
<small>author</small>
</h1>
<form action="" method="post" enctype="multipart/form-data">
<div class="form-group">
<label for="title">Post title</label>
<input type="text" class="form-control" name="title" >
</div>
<div class="form-group">
<label for="post_category">Post Category Id </label>
<input type="text" class="form-control" name="post_category_id" >
</div>
<div class="form-group">
<label for="post_author">Post Author </label>
<input type="text" class="form-control" name="post_author">
</div>
<div class="form-group">
<label for="post_status">Post Status </label>
<input type="text" class="form-control" name="post_status" >
</div>
<div class="form-group">
<label for="post_image">Post Image</label>
<input type="file" class="form-control" name="image" >
</div>
<div class="form-group">
<label for="post_tags">Post Tags </label>
<input type="text" class="form-control" name="post_tags" >
</div>
<div class="form-group">
<label for="post_content">Post Contents</label>
<textarea class="form-control" name="post_content" id="" cols="30" rows="10"></textarea>
</div>
<div class="form-group">
<label for="post_tags">Post Tags </label>
<input type="text" name="create_post" class="form-control">
</div>
<div class="form-group">
<input class="btn btn-primary" type="submit" value="Publish" name="create_post" >
</div>
</form>
需要reindex
,然后按isin
检查值,最后使用boolean indexing
:
Advice
编辑:
对于删除重复项,这里更容易使用:
reset_index
method='ffill'
的列
sort_values
- print (test_df.reindex(df.index, method='ffill'))
Passed?
foo a 1 True
2 True
b 1 False
2 False
qux a 1 False
2 False
b 1 Ok
2 Ok
mask = test_df.reindex(df.index, method='ffill').isin(['True', 'Ok'])['Passed?']
print (mask)
foo a 1 True
2 True
b 1 False
2 False
qux a 1 False
2 False
b 1 True
2 True
Name: Passed?, dtype: bool
print (df[mask])
X Y
foo a 1 -0.580448 -0.168951
2 -0.875165 1.304745
qux b 1 -0.147014 -0.787483
2 0.188989 -1.159533
列降序,第一和第二升序drop_duplicates
- 仅保留第一个值set_index
for MultiIndex back rename_axis
用于删除索引名称MultiIndex
另一种解决方案更简单 - 首先排序,然后Passed?
排序test_df = test_df.reset_index()
.sort_values(['level_0','level_1', 'Passed?'], ascending=[1,1,0])
.drop_duplicates(['level_0','level_1'])
.set_index(['level_0','level_1'])
.rename_axis([None, None])
print (test_df)
Passed?
foo a True
b False
qux a False
b Ok
:
groupby
EDIT1:
将值转换为ordered Categorical。
first