Question

我正在尝试从列表中删除重复项，然后再写入JSON文件。我注释了实现代码的行，并添加了额外的打印语句以进行调试。根据我的调试，代码也不会到达打印语句，也不会写入JSON文件。我的错误在trendingBot（）函数中。目前，由于代码中没有注释任何内容，因此重复项将被写入JSON文件。

def idx(x):
    idx = pd.MultiIndex.from_product([x.index.get_level_values(0).unique(), x.index.get_level_values(1).unique(), new_index])
    return idx



pd.concat([y.reindex(idx(y)).interpolate() for _,y in df_mi.groupby(level=[0,1])])

       value
1 1 1    1.0
    2    1.5
    3    2.0
    4    1.5
    5    1.0
    6    1.0
    7    1.0
    8    1.0
    9    1.0
  2 1    NaN
    2    NaN
    3    2.0
    4    2.0
    5    2.0
    6    1.5
    7    1.0
    8    0.5
    9    0.0

条目重复的Json文件

    convertToJson(quote_name, quote_price, quote_volume, url)

    quotesArr = []
    # Convert to a JSON  file


    def convertToJson(quote_name, quote_price, quote_volume, url):

        quoteObject = {
            "url": url,
            "Name": quote_name,
            "Price": quote_price,
            "Volume": quote_volume
        }
        quotesArr.append(quoteObject)


    def trendingBot(url, browser):
        browser.get(url)
        trending = getTrendingQuotes(browser)
        for trend in trending:
            getStockDetails(trend, browser)
        # requests finished, write json to file

        # REMOVE ANY DUPLICATE url from the list, then write json to file.
        quotesArr_dict = {quote['url']: quote for quote in quotesArr}
        # print(quotesArr_dict)
        quotesArr = list(quotesArr_dict.values())
        # print(quotesArr)
        with open('trendingQuoteData.json', 'w') as outfile:
            json.dump(quotesArr, outfile)

Answer 1

如果您只想从列表中删除重复项，则可以这样操作：

    firstlist = [
  {
    "url": "https://web.tmxmoney.com/quote.php?qm_symbol=ACB&locale=EN",
    "Volume": "Volume:\n12,915,903",
    "Price": "$ 7.67",
    "Name": "Aurora Cannabis Inc."
  },

  {
    "url": "https://web.tmxmoney.com/quote.php?qm_symbol=HNL&locale=EN",
    "Volume": "Volume:\n548,038",
    "Price": "$ 1.60",
    "Name": "Horizon North Logistics Inc."
  },
  {
    "url": "https://web.tmxmoney.com/quote.php?qm_symbol=ACB&locale=EN",
    "Volume": "Volume:\n12,915,903",
    "Price": "$ 7.67",
    "Name": "Aurora Cannabis Inc."
  }
]
newlist=[]
for i in firstlist:
    if i not in newlist:
       newlist.append(i)

json.dumps(newlist)
>>>[{"url": "https://web.tmxmoney.com/quote.php?qm_symbol=ACB&locale=EN", "Volume": "Volume:\n12,915,903", "Price": "$ 7.67", "Name": "Aurora Cannabis Inc."}, {"url": "https://web.tmxmoney.com/quote.php?qm_symbol=HNL&locale=EN", "Volume": "Volume:\n548,038", "Price": "$ 1.60", "Name": "Horizon North Logistics Inc."}]

我使用json.dumps向您显示return语句，但是如果您使用json.dump将其写入文件，那么它也可以工作。我也测试过。 jsut没有提供漂亮的return语句。

Answer 2

我会尝试使用实际的循环而不是字典理解

quote_dict = dict()        
for quote in quotesArr:
    url = quote['url']
    if url not in quote_dict:
        quote_dict[url] = quote  # Only add if url is not already in dict

with open('trendingQuoteData.json', 'w') as outfile:
    json.dump(list(quotesArr_dict.values()), outfile)

我将创建一个至少实现Quote的{{1}}类，而不是词典，以便您可以确定相等性。

Answer 3

最简单的方法是将其转换为set，然后将其转换回list：

mylist = [1,2,3,1,2,3]
mylist2 = list(set(mylist))

print(mylist)
print(mylist2)

这将是输出：

[1, 2, 3, 1, 2, 3]
[1, 2, 3]

如何从列表中删除重复项

3 个答案: