使用Pandas基于索引数据框

时间:2016-12-22 17:43:43

标签: python pandas dataframe

我为新手问题道歉,但是我很难搞清楚熊猫'数据帧。我有一个像

这样的数据框
df_index:
Product    Title
100000     Sample main product
200000     Non-consecutive main sample

我有另一个数据框,其中包含更详细的产品列表,例如

df_details:
Product                    Title
100000                    Sample main product
100000-Format-English     Sample product details
100000-Format-Spanish     Sample product details
100000-Format-French      Sample product details
110000                    Another sample main product
110000-Format-English     Another sample details
110000-Format-Spanish     Another sample details
120000                    Yet another sample main product
120000-Format-English     Yet another sample details
120000-Format-Spanish     Yet another sample details
...
200000                    Non-consecutive main sample
200000-Format-English     Non-consecutive sample details
200000-Format-Spanish     Non-consecutive sample details

我想基于df_details创建一个新的数据框,但仅适用于df_index中显示的产品。理想情况下,它看起来像:

new_df:
Product                    Title
100000                    Sample main product
100000-Format-English     Sample product details
100000-Format-Spanish     Sample product details
100000-Format-French      Sample product details
200000                    Non-consecutive main sample
200000-Format-English     Non-consecutive sample details
200000-Format-Spanish     Non-consecutive sample details

这是我迄今为止所做的尝试:

new_df = df_details[df_details['Product'][0:5] == df_index['Product'][0:5]]

这给了我一个错误:

ValueError: Can only compare identically-labeled Series objects

我也试过

new_df = pd.merge(df_index, df_details, 
  left_on=['Product'[0:5]], right_index=True, how='left')

这给了我一个结果数据集,但不是我想要的那种 - 它没有包含带有格式信息的详细信息行。

2 个答案:

答案 0 :(得分:2)

您应该可以使用new_df = df_details[df_details['Product'].isin(df_index['Product']] 作为:

str.contains()

这将执行仅查找公共索引的掩码。

编辑:这只适用于列是否具有相同的字符串。要解决此问题,您可以将import re # create a pattern to look for pat ='|'.join(map(re.escape, df_index['Product'])) # Create the mask new_df = df_details[df_details['Product'].str.contains(pat)] 与:

一起使用
var app = angular.module('myApp',[]);

app.controller('myController',function( $scope ) { 
  $scope.validate = function() {
    alert('submitting..');
  }
});

如果列格式化为字符串,则此方法有效。

答案 1 :(得分:0)

以下是我设法解决这个问题的方法 - 我确定它不是很好,或者是实现它的最快方法,但确实有效。

我使用pandas'.itterow()和一些forif循环来逐行浏览数据框:

# create a list based on the 'Product' column of df_index
increment = 0
index_list = []
for product, row in df_index.iterrows():
    prod_num = df_index.product.iloc[increment]
    index_list.append(prod_num)
    increment += 1

#construct a new data frame based on the rows in df_details that are found in index_list
new_df = pd.DataFrame(columns=detail_df.columns)
increment_detail = 0
for product, row in df_details.iterrows():
    prod_num_detail = df_details.product.iloc[increment_detail]
    prod_num_detail = prod_num_detail[0:6]
    if str(prod_num_detail) in dupe_list:
        new_df = new_df.append(df_details.iloc[increment_detail])
        increment_detail += 1
    else:
        increment_detail += 1