Pandas 根据标题相似性过滤掉行

时间:2021-02-09 00:27:44

标签: python pandas dataframe

我有一个名为 title 的列的数据框,我想应用 textdistance 来检查不同标题之间的相似性并删除任何具有相似标题的行(基于特定阈值)。 是否可以直接执行此操作,或者我需要定义一个自定义函数并将相似的标题组合在一起,然后再删除“重复项”(相似的标题)? 示例如下所示。

h_repec_id,pub_repec_id,h_name,pub_url,h_lastname,h_firstname_initial,pub_title,pub_year,pub_month,pub_host_institution,pub_series,pub_repec_url,pub_wp_number,pub_abstract,pub_keywords,pub_doi,pub_type,pub_url_pdf,source_name,source_volume,source_pages,pub_editor,pub_book_title,pub_publisher,pub_series_title,pub_edition,pub_isbn,h_location,h_address,h_affiliation,h_repec_url,h_firstname,h_middlename,h_suffix,h_email,h_homepage,h_address_long,h_phd_institution,h_twitter
"",RePEc:aag:wpaper:v:22:y:2018:i:1:p:204-229,Wendy Nyakabawo,https://ideas.repec.org/a/aag/wpaper/v22y2018i1p204-229.html,nyakabawo,w,"High Frequency Impact Of Monetary Policy And Macroeconomic Surprises On Us Msas, Aggregate Us Housing Returns And Asymmetric Volatility","2018,",December,,,https://ideas.repec.org/a/aag/wpaper/v22y2018i1p204-229.html,1,"This paper explores the impact of monetary policy and macroeconomic surprises on the U.S housing market returns and volatility at the Metropolitan Statistical Area (MSA) and aggregate level using a GJR (or threshold generalized autoregressive conditional heteroscedasticity (GARCH)) model of Glosten, Jagannathan and Runkle (1993). Using daily data and sampling periods which cover both the conventional and unconventional monetary policy periods, empirical results show that monetary policy surprises have a greater impact on the volatility of housing market returns across time with particularly pronounced effect during the conventional monetary policy period. We also show that macroeconomic surprises do not have a significant impact on housing returns for most MSAs for the full sample, conventional and unconventional monetary policy periods.",Monetary policy and macroeconomic surprises; Asymmetric GARCH; Housing market returns and volatility,"",article,http://journal.asia.edu.tw/ADS/wp-content/uploads/papers/2018/Monetary-Policy-and-Macroeconomic-Surprises.pdf,Advances in Decision Sciences,22,204-229,,,,,,,,,,,,,,,,,,
"",RePEc:aag:wpaper:v:22:y:2018:i:1:p:204-229,Wendy Nyakabawo,https://ideas.repec.org/a/aag/wpaper/v22y2018i1p204-229.html,nyakabawo,w,"High Frequency Impact Of Monetary Policy. And Macroeconomic Surprises On Us Msas. Aggregate Us Housing Returns And Asymmetric Volatility.","2018,",December,,,https://ideas.repec.org/a/aag/wpaper/v22y2018i1p204-229.html,1,"This paper explores the impact of monetary policy and macroeconomic surprises on the U.S housing market returns and volatility at the Metropolitan Statistical Area (MSA) and aggregate level using a GJR (or threshold generalized autoregressive conditional heteroscedasticity (GARCH)) model of Glosten, Jagannathan and Runkle (1993). Using daily data and sampling periods which cover both the conventional and unconventional monetary policy periods, empirical results show that monetary policy surprises have a greater impact on the volatility of housing market returns across time with particularly pronounced effect during the conventional monetary policy period. We also show that macroeconomic surprises do not have a significant impact on housing returns for most MSAs for the full sample, conventional and unconventional monetary policy periods.",Monetary policy and macroeconomic surprises; Asymmetric GARCH; Housing market returns and volatility,"",article,http://journal.asia.edu.tw/ADS/wp-content/uploads/papers/2018/Monetary-Policy-and-Macroeconomic-Surprises.pdf,Advances in Decision Sciences,22,204-229,,,,,,,,,,,,,,,,,,
"",RePEc:eee:deveco:v:45:y:1994:i:1:p:101-119,"Markusen, James R.",https://ideas.repec.org/a/eee/deveco/v45y1994i1p101-119.html,markusen,j,Complementarity and increasing returns in intermediate inputs,"1994,",October,,,https://ideas.repec.org/a/eee/deveco/v45y1994i1p101-119.html,1,No abstract is available for this item.,"","",article,http://www.sciencedirect.com/science/article/pii/0304-3878(94)90061-2,Journal of Development Economics,45,101-119,,,,,,,,,,,,,,,,,,
"",RePEc:aag:wpaper:v:22:y:2018:i:1:p:204-229,Hardik A. Marfatia,https://ideas.repec.org/a/aag/wpaper/v22y2018i1p204-229.html,marfatia,h,"High Frequency Impact Of Monetary Policy And Macroeconomic Surprises On Us Msas, Aggregate Us Housing Returns And Asymmetric Volatility","2018,",December,,,https://ideas.repec.org/a/aag/wpaper/v22y2018i1p204-229.html,1,"This paper explores the impact of monetary policy and macroeconomic surprises on the U.S housing market returns and volatility at the Metropolitan Statistical Area (MSA) and aggregate level using a GJR (or threshold generalized autoregressive conditional heteroscedasticity (GARCH)) model of Glosten, Jagannathan and Runkle (1993). Using daily data and sampling periods which cover both the conventional and unconventional monetary policy periods, empirical results show that monetary policy surprises have a greater impact on the volatility of housing market returns across time with particularly pronounced effect during the conventional monetary policy period. We also show that macroeconomic surprises do not have a significant impact on housing returns for most MSAs for the full sample, conventional and unconventional monetary policy periods.",Monetary policy and macroeconomic surprises; Asymmetric GARCH; Housing market returns and volatility,"",article,http://journal.asia.edu.tw/ADS/wp-content/uploads/papers/2018/Monetary-Policy-and-Macroeconomic-Surprises.pdf,Advances in Decision Sciences,22,204-229,,,,,,,,,,,,,,,,,,
"",RePEc:aag:wpaper:v:22:y:2018:i:1:p:95-114,Nikolaos Antonakakis,https://ideas.repec.org/a/aag/wpaper/v22y2018i1p95-114.html,antonakakis,n,Is Wine A Safe-Haven? Evidence From A Nonparametric Causality-In-Quantiles Test,"2018,",December,,,https://ideas.repec.org/a/aag/wpaper/v22y2018i1p95-114.html,1,"Unlike the extant literature on safe-havens, where one aims to relate the movements in an asset considered with extreme episodes in equity markets, we test this property for fine wine, by relating it to global uncertainty. Using a nonparametric k-th order causality-in-quantiles test, we show that, while uncertainty does affect returns and/or variance of the alternative wine indices considered, this effect is restricted to only certain parts of the conditional distribution. In particular, wine seems to be unaffected by global uncertainty, and hence, acts as a safe-haven at extreme ends of the market, i.e., during bear or bullish times; but not during normal times (around the median of the conditional distribution of returns and/or volatility).",Wine Returns and Volatility; Global Uncertainty; Safe-Haven; Nonparametric; Quantile Causality,"",article,http://journal.asia.edu.tw/ADS/wp-content/uploads/papers/2018/Wine_SafeHaven_ADS.pdf,Advances in Decision Sciences,22,95-114,,,,,,,,,,,,,,,,,,
"",RePEc:cdl:ucsdec:qt5112208j,"Carson, Richard T",https://ideas.repec.org/p/cdl/ucsdec/qt5112208j.html,carson,r,Arsenic Mitigation in Bangladesh: A Houseold Labor Market Approach,"2009,","Nov,","Department of Economics, UC San Diego","University of California at San Diego, Economics Working Paper Series",https://ideas.repec.org/p/cdl/ucsdec/qt5112208j.html,qt5112208j,"A major environmental tragedy of modern times is the widespread arsenic contamination of shallow drinking water wells in Bangladesh. High levels of arsenic present in many wells went unrecognized for years. Now large numbers of people show a range of symptoms associated with chronic arsenic exposure. Most of the economics literature follows an epidemiological approach effectively monetizing a dose response relation. We take a different approach, given widespread exposure, and examine impacts on household labor supply. We find significant effects broadly consistent with available epidemiological information in terms of the percent of the population impacted and which demographic groups are most impacted. The nature of the arsenic contamination provides a high quality statistical instrument that identifies a labor supply reduction of over 8\%. Particular attention is paid to large substitution effects involving within household labor supply as this is the primary means of insurance among poor households in developing countries.",household labor; Social and Behavioral Sciences,"",techreport,https://www.escholarship.org/uc/item/5112208j.pdf;origin=repeccitec,,,,,,,,,,,,,,,,,,,,,
"",RePEc:aag:wpaper:v:22:y:2018:i:1:p:95-114,Elie Bouri,https://ideas.repec.org/a/aag/wpaper/v22y2018i1p95-114.html,bouri,e,Is Wine A Safe-Haven? Evidence From A Nonparametric Causality-In-Quantiles Test,"2018,",December,,,https://ideas.repec.org/a/aag/wpaper/v22y2018i1p95-114.html,1,"Unlike the extant literature on safe-havens, where one aims to relate the movements in an asset considered with extreme episodes in equity markets, we test this property for fine wine, by relating it to global uncertainty. Using a nonparametric k-th order causality-in-quantiles test, we show that, while uncertainty does affect returns and/or variance of the alternative wine indices considered, this effect is restricted to only certain parts of the conditional distribution. In particular, wine seems to be unaffected by global uncertainty, and hence, acts as a safe-haven at extreme ends of the market, i.e., during bear or bullish times; but not during normal times (around the median of the conditional distribution of returns and/or volatility).",Wine Returns and Volatility; Global Uncertainty; Safe-Haven; Nonparametric; Quantile Causality,"",article,http://journal.asia.edu.tw/ADS/wp-content/uploads/papers/2018/Wine_SafeHaven_ADS.pdf,Advances in Decision Sciences,22,95-114,,,,,,,,,,,,,,,,,,
"",RePEc:hal:journl:halshs-00773536,Olivier Gaussens,https://ideas.repec.org/p/hal/journl/halshs-00773536.html,gaussens,o,X-efficiency of innovation processes evaluation based on multiobjective Data Envelopment Analysis,"2012,","Feb,",HAL,Post-Print,https://ideas.repec.org/p/hal/journl/halshs-00773536.html,halshs-00773536,No abstract is available for this item.,innovation processes; X-efficiency; multiobjective Data Envelopment Analysis; evaluation,"",techreport,,,,,,,,,,,,,,,,,,,,,,
"",RePEc:aag:wpaper:v:23:y:2019:i:1:p:88-113,Esin Cakan,https://ideas.repec.org/a/aag/wpaper/v23y2019i1p88-113.html,cakan,e,Economic Policy Uncertainty and Herding Behavior Evidence from the South African Housing Market,"2019,",March,,,https://ideas.repec.org/a/aag/wpaper/v23y2019i1p88-113.html,1,"This paper examines the link between economic policy uncertainty and herding behaviour in financial markets with an application to the South African housing market. Building on the evidence in the literature that herding behaviour driven by human emotions is not only limited to financial markets, but is also present in real estate investments, we examine the presence of herding in this emerging market via static and dynamic herding tests. While the static model fails to detect herding in the South African housing market, a dynamic model based on a two-regime Markov switching specification shows evidence of herding during the high volatility regime only, consistent with the notion that herd behaviour is primarily driven by increased market uncertainty. Extending our analysis via quantile regressions, we further show that higher quantiles of policy uncertainty are associated with greater likelihood of being in the herding regime, thus establishing a link between policy uncertainty and herding behaviour. Overall, our findings suggest that policy uncertainty can serve as a driver of market inefficiencies, which in our case, is associated by the presence of herding.",Herding; Housing Market; South Africa; Regime-Switching; Uncertainty,"",article,http://journal.asia.edu.tw/ADS/wp-content/uploads/papers/2019/Economic-Policy-Uncertainty-and-Herding-Behavior-Evidence-from-the-South-African-Housing-Market.pdf,Advances in Decision Sciences,23,88-113,,,,,,,,,,,,,,,,,,
"",RePEc:aag:wpaper:v:23:y:2019:i:1:p:88-113,Riza Demirer,https://ideas.repec.org/a/aag/wpaper/v23y2019i1p88-113.html,demirer,r,Economic Policy Uncertainty and Herding Behavior Evidence from the South African Housing Market,"2019,",March,,,https://ideas.repec.org/a/aag/wpaper/v23y2019i1p88-113.html,1,"This paper examines the link between economic policy uncertainty and herding behaviour in financial markets with an application to the South African housing market. Building on the evidence in the literature that herding behaviour driven by human emotions is not only limited to financial markets, but is also present in real estate investments, we examine the presence of herding in this emerging market via static and dynamic herding tests. While the static model fails to detect herding in the South African housing market, a dynamic model based on a two-regime Markov switching specification shows evidence of herding during the high volatility regime only, consistent with the notion that herd behaviour is primarily driven by increased market uncertainty. Extending our analysis via quantile regressions, we further show that higher quantiles of policy uncertainty are associated with greater likelihood of being in the herding regime, thus establishing a link between policy uncertainty and herding behaviour. Overall, our findings suggest that policy uncertainty can serve as a driver of market inefficiencies, which in our case, is associated by the presence of herding.",Herding; Housing Market; South Africa; Regime-Switching; Uncertainty,"",article,http://journal.asia.edu.tw/ADS/wp-content/uploads/papers/2019/Economic-Policy-Uncertainty-and-Herding-Behavior-Evidence-from-the-South-African-Housing-Market.pdf,Advances in Decision Sciences,23,88-113,,,,,,,,,,,,,,,,,,
"",RePEc:cdp:diam10:047,Fernando Salgueiro Perobelli,https://ideas.repec.org/h/cdp/diam10/047.html,perobelli,f,Impactos Econômicos Das Mudanças Climáticas No Brasil,"2010,",January,,,https://ideas.repec.org/h/cdp/diam10/047.html,"","The aim of this paper is to develop scenarios of the economic impacts of climate change in Brazil, articulating the projected impacts of climate change on agricultural and energy sectors to macroeconomic scenarios, these related to climate change scenarios developed by IPCC (A2 and B2).(This abstract was borrowed from another version of this item.)(This abstract was borrowed from another version of this item.)","","",incollection,http://www.cedeplar.ufmg.br/seminarios/seminario_diamantina/2010/D10A047.pdf,,"","",,Anais do XIV Seminário sobre a Economia Mineira [Proceedings of the 14th Seminar on the Economy of Minas Gerais],"Cedeplar, Universidade Federal de Minas Gerais",Anais do XIV Seminário sobre a Economia Mineira [Proceedings of the 14th Seminar on the Economy of Minas Gerais],"",,,,,,,,,,,,,
"",RePEc:aag:wpaper:v:23:y:2019:i:1:p:88-113,Josine Uwilingiye,https://ideas.repec.org/a/aag/wpaper/v23y2019i1p88-113.html,uwilingiye,j,Economic Policy Uncertainty and Herding Behavior Evidence from the South African Housing Market,"2019,",March,,,https://ideas.repec.org/a/aag/wpaper/v23y2019i1p88-113.html,1,"This paper examines the link between economic policy uncertainty and herding behaviour in financial markets with an application to the South African housing market. Building on the evidence in the literature that herding behaviour driven by human emotions is not only limited to financial markets, but is also present in real estate investments, we examine the presence of herding in this emerging market via static and dynamic herding tests. While the static model f
ails to detect herding in the South African housing market, a dynamic model based on a two-regime Markov switching specification shows evidence of herding during the high volatility regime only, consistent with the notion that herd behaviour is primarily driven by increased market uncertainty. Extending our analysis via quantile regressions, we further show that higher quantiles of policy uncertainty are associated with greater likelihood of being in the herding regime, thus establishing a link between policy uncertainty and herding behaviour. Overall, our findings suggest that policy uncertainty can serve as a driver of market inefficiencies, which in our case, is associated by the presence of herding.",Herding; Housing Market; South Africa; Regime-Switching; Uncertainty,"",article,http://journal.asia.edu.tw/ADS/wp-content/uploads/papers/2019/Economic-Policy-Uncertainty-and-Herding-Behavior-Evidence-from-the-South-African-Housing-Market.pdf,Advances in Decision Sciences,23,88-113,,,,,,,,,,,,,,,,,,
"",RePEc:cdp:diam10:047,Carlos Roberto Azzoni,https://ideas.repec.org/h/cdp/diam10/047.html,azzoni,c,Impactos Econômicos Das Mudanças Climáticas No Brasil,"2010,",January,,,https://ideas.repec.org/h/cdp/diam10/047.html,"","The aim of this paper is to develop scenarios of the economic impacts of climate change in Brazil, articulating the projected impacts of climate change on agricultural and energy sectors to macroeconomic scenarios, these related to climate change scenarios developed by IPCC (A2 and B2).(This abstract was borrowed from another version of this item.)(This abstract was borrowed from another version of this item.)","","",incollection,http://www.cedeplar.ufmg.br/seminarios/seminario_diamantina/2010/D10A047.pdf,,"","",,Anais do XIV Seminário sobre a Economia Mineira [Proceedings of the 14th Seminar on the Economy of Minas Gerais],"Cedeplar, Universidade Federal de Minas Gerais",Anais do XIV Seminário sobre a Economia Mineira [Proceedings of the 14th Seminar on the Economy of Minas Gerais],"",,,,,,,,,,,,,
"",RePEc:aag:wpaper:v:23:y:2019:i:2:p:151-163,Mark E. Wohar,https://ideas.repec.org/a/aag/wpaper/v23y2019i2p151-163.html,wohar,m,Presidential Cycles In The Usa And The Dollar-Pound Exchange Rate: Evidence From Over Two Centuries,"2019,",June,,,https://ideas.repec.org/a/aag/wpaper/v23y2019i2p151-163.html,2,"In this paper, we analyze the impact of the U.S. presidential cycles on the dollar relative to the British pound over the longest possible monthly period of 1791:01 to 2018:10, based on GJR (or threshold generalized autoregressive conditional heteroscedasticity (GARCH)) model. The usage of over two centuries of data controls for sample selection bias, while a GJR model accommodates for omitted variable bias. We find that over the entire sample period, the Democratic regime has indeed depreciated the dollar relative to the pound. However, during the post Bretton Woods era, the depreciation of the dollar is not statistically significant under the Democratic presidents.",Exchange Rate; U.S. Presidential Cycles,"",article,http://journal.asia.edu.tw/ADS/wp-content/uploads/papers/2019/Presidential-Cycles-in-the-USA-and-the-Dollar-Pound-Exchange-Rate-Evidence-from-Over-Two-Centuries.pdf,Advances in Decision Sciences,23,151-163,,,,,,,,,,,,,,,,,,
"",RePEc:aag:wpaper:v:23:y:2019:i:3:p:93-121,Goodness C. Aye,https://ideas.repec.org/a/aag/wpaper/v23y2019i3p93-121.html,aye,g,Macroeconomic Uncertainty And The Comovement In Buying Versus Renting In The Usa,"2019,",September,,,https://ideas.repec.org/a/aag/wpaper/v23y2019i3p93-121.html,3,"This paper characterizes the sources of the comovement in the U.S metropolitan buy-rent growth rate. The analysis is based on quarterly buy-rent indices from 1982:Q1 to 2016:Q4. To this end, we used the dynamic factor model to decompose the index into national and local factors. The national component contributed more to the variation in the buy-rent indices relative to the local component with variance decomposition values of 72\% and 27\% respectively albeit this varied across the cities. We further examined the sensitivity of the national buy-rent factor to macroeconomic uncertainty. Our full sample results show that uncertainty has a significant negative effect on the buy-rent behavior thus favouring buying a home as a wealth accumulation channel and hence a hedge relative to renting a similar home and investing in other financial assets. The results from the recursive estimation further confirmed a dominant negative relationship with fewer periods of positive relationship. The implications of these findings are drawn.",Buy-rent choice; consumer behavior; dynamic latent factor model; development; economic uncertainty,"",article,http://journal.asia.edu.tw/ADS/wp-content/uploads/papers/2019/Macroeconomic-Uncertainty-and-the-Comovement-in-Buying-versus-Renting-in-the-USA.pdf,Advances in Decision Sciences,23,93-121,,,,,,,,,,,,,,,,,,

目标是删除前两行之一,因为它们在 pub_title 下具有相似的标题。

1 个答案:

答案 0 :(得分:0)

所以我以不同的方式做到了。 我创建了一个列来屏蔽要保留和删除的行。 我访问了目标行并检查了与它下面的行的相似性。

def remove_similar_titles(df):
df.index = range(len(df.index))
df['keep'] = 1
for index, target_row in df.iterrows():
    target_title = target_row['pub_title']
    for j in range(index+1, len(df.index)):
            row = df.iloc[[j]]
            title = row['pub_title'].iloc[0]
            res = textdistance.jaro.similarity(target_title, title)
            print(str(res) + ' --- ' + target_title + ' --- ' + title)
            print(row['keep'].iloc[0])
            if res > 0.85 and row['keep'].iloc[0] == 1:
                df.loc[j, 'keep'] = 0
return df
相关问题