这是我从数据库中提取的一个例子。我正在合作中使用可视化,因此基于此示例,我必须在两位作者中保持一种关系。比如我必须删除一个Brian Norton --- Maria Roo Ons或Maria Roo Ons --- Brian Norton以保持关系的独特性。
-------------------------------------------------------------------------------------------------
| article_title | author_name | coauthor_name |
-------------------------------------------------------------------------------------------------
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons | Brian Norton
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann | Brian Norton
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu | Brian Norton
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Sarah McCormack | Brian Norton
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Sarah McCormack | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Sarah McCormack | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Sarah McCormack | S. Shynu
-------------------------------------------------------------------------------------------------
理想的最终输出如下。
-------------------------------------------------------------------------------------------------
| article_title | author_name | coauthor_name |
-------------------------------------------------------------------------------------------------
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu | Sarah McCormack
在这种情况下,我只想保留一行。我如何在R或Python中处理它? 非常感谢你的帮助。
答案 0 :(得分:1)
我假设您有一个单独的数据库,并使用python连接它。
可能的方法:
1)您可以根据article
列添加行号,然后执行重复数据删除。您可以查看this答案,了解如何在SQL中进行操作。
然后您可以使用python-db连接器
运行查询2)您可以将记录拉入pandas数据帧并在那里进行分析。 Pandas适用于处理和操作数据。
答案 1 :(得分:0)
我假设您的数据框架看起来像我在下面显示的那样,因为您还没有分享可能出现的其他可能性。
article author1 author2
A a b
A b a
A a a
A b b
在R中,这就是我可以获取您正在寻找的行的方式。我假设您的数据框为df1
。
# This will create a new dataframe df2 with only those rows where author1 and author2 are different
df2 <- df1[df1$author1 != df1$author2, ]
输出看起来就像你在问题中提供的那样。
article author1 author2
A a b
A b a
请告诉我这是否是你需要的。