我遇到一个相对简单的任务问题......
我有两个数据帧: 我从csv
中读到的df_sample
+------+-----------+-------+-----------+
| key | Full Text | Date | Publisher |
+------+-----------+-------+-----------+
| abcd | foofoo | date1 | a |
| bcde | barbar | date2 | b |
| cdef | foobar | date3 | c |
+------+-----------+-------+-----------+
len(df_sample) = 20000
我从excel中读到的 df_labels
+------+----------+--------+--------+
| key | relevant | other | other2 |
+------+----------+--------+--------+
| abcd | yes | blabla | blabla |
| bcde | no | blabla | blabla |
| cdef | no | blabla | blabla |
| defg | yes | blabla | blabla |
+------+----------+--------+--------+
len(df_labels) = 219000
我想加入两个表,为第一个数据帧中的每个键分配relevant
值。期望的输出将是这样的:
+------+-----------+-------+-----------+----------+
| key | Full Text | Date | Publisher | relevant |
+------+-----------+-------+-----------+----------+
| abcd | foofoo | date1 | a | yes |
| bcde | barbar | date2 | b | no |
| cdef | foobar | date3 | c | no |
+------+-----------+-------+-----------+----------+
我似乎达到了这一点,但为什么下面给了我27377结果而不是20000(如原左表中所示):
df = pd.merge(left=df_sample, right=df_labels, on="key")
答案 0 :(得分:2)
您正在查看其他行,因为这些密钥在两个dfs中都不是唯一的,在您的情况下是第二个df。您需要决定是否要重复行,这是当前行为,或者您想要删除第二个df中的重复行:
<form>
<div class="input-group">
<label for="name">Username:</label>
<input type="text" id="name" value="" />
</div>
<div class="input-group">
<label for="password">Password:</label>
<input type="text" id="password" value="" />
</div>
</form>
默认情况下会保留第一个副本,如果您想要保留最后一个等替代行为,那么您可以传递:let swiftCourse = Course(buttonImage:"cover_developer",
title: "Become a Developer",
instructor: "Duc Tran",
featureImage: "developer",
introductionVideoURL: URL(String: "https://www.youtube.com/watch?v=Inn2K-V3NFI"),
description: "Lorem ipsum dolor sit er elit lamet, consectetaur cillium adipisicing pecu, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Nam liber te conscient to factor tum poen legum odioque civiuda.")
请参阅docs