Question

在ruby中，我尝试将URL列表与之前的URL列表进行比较，并仅获取新的URL。

我将旧列表放在一个文本文件中，每行一个URL。我正在将文本文件读入如下数组：

oldLines = File.open('logfile.txt', 'r').readlines

我使用与旧列表完全相同的方法填充了一组新值，并且可能与名为“newLines”的旧列表有一些重叠。我试图只获得与旧列表不匹配的值。让我们说'newList'.length = 100和'oldlist'.length = 95，我通过视觉检查知道它们之间有90个元素重叠。我尝试过的事情：

newList = newList - oldList
#(newList | oldList) returns 195
#(newList & oldList) returns 0


newList.delete_if { |x| oldList.include?(x) }

在这两种情况下，都不会从newList中删除任何内容。我知道我在这里遗漏了一些东西。感谢。

Answer 1

我做了以下事情：

<强> A.TXT

http://yahoo.com
http://google.com
http://bing.com

<强> b.txt

http://bing.com
http://yahoo.com

<强> test.rb

a = File.open('a.txt', 'r').readlines.map!(&:chomp)
b = File.open('b.txt', 'r').readlines.map!(&:chomp)
p a-b #=> ["http://google.com"]

如果没有chomp，它会失败，因为a.txt http://yahoo.com\n b.txt http://yahoo.com \n我只有{{1}}而{{1}}没有{{1}}端。

Answer 2

您需要做的就是为数组调用减法方法。

['1', '2', '3', '4', '5'] - ['2', '3', '4']

# => ["1", "5"]

不确定为什么这不适合你。为你的两个数组发布一些url示例数据，问题可能就在那里，我会相应地更新我的答案。

Answer 3

我无法弄清楚你的代码有什么问题，所以我把它拿到了irb上。我仍然没有任何答案。什么是newList和oldList。这些数据结构是如何填充的？他们是阵列吗？

irb(main):003:0> oldLines = File.open('/Users/pprakash/old', 'r').readlines
=> ["http://www.google.com\n", "http://yahoo.com\n", "http://slideshare.net\n"]
irb(main):004:0> newLines = File.open('/Users/pprakash/new', 'r').readlines
=> ["http://www.google.com\n", "http://yahoo.com\n", "http://slideshare.net\n", "http://great.com\n", "http://example.com\n"]
irb(main):005:0> x = newLines - oldLines
=> ["http://great.com\n", "http://example.com\n"]
irb(main):006:0> newLines
=> ["http://www.google.com\n", "http://yahoo.com\n", "http://slideshare.net\n", "http://great.com\n", "http://example.com\n"]
irb(main):007:0> oldLines
=> ["http://www.google.com\n", "http://yahoo.com\n", "http://slideshare.net\n"]
irb(main):008:0> newLines = newLines - oldLines
=> ["http://great.com\n", "http://example.com\n"]
irb(main):009:0> newLines
=> ["http://great.com\n", "http://example.com\n"]
irb(main):010:0>

Answer 4

我无法重现您的问题。这就是我做的事情

<强> urls.txt

http://www.google.com
http://www.digg.com
http://www.slashdot.com
http://www.yahoo.com

<强> urls2.txt

http://www.google.com
http://www.digg.com
http://www.slashdot.com
http://www.yahoo.com
http://www.dzone.com
http://www.digit.com
http://www.digitaldreams.com

的代码 的

first = File.open('urls.txt', 'r').readlines second = File.open('urls2.txt', 'r').readlines disjoint = second - first

更新：在尝试了其他一些事情后，我通过选择“\ n”部分网址来删除我的代码，并使用“\ n”减去网址没有'\ n'的网址并没有删除任何内容。所以我想象为什么你没有看到任何被删除的东西是这样的错误。在减去之前尝试打印出两个URL。

比较Ruby中两个数组的问题

4 个答案: