Deleting tables with duplicate contents

RaptorBandit • 2018-10-28*2018-10-28 17:08* •

Hi everyone, I seem to have coded myself into a bit of a hole and I'm hoping someone has a suggestion on how to get out of it!

What I need to be able to do is - iterate through all the subtables in a table, and delete any subtables that have identical values. Below is the setup and the two methods my researching has found, but I've tried both so far with no success:

x = {}

add(x,{a=1})

add(x,{a=1})

--table x now has two subtables
--that both contain a=1, and we
--need to delete the duplicate!

Method one:

index={}
for t in all(x) do
	if index[t] then
		del(x,t)
	end
	index[t]=true
end

No luck with this one, it DOES work if the duplicate values live in the main table and outside of the subtable, but I'm having trouble formatting it correctly to check the subtables.

Method two:

for i=1,#x do
	if x[i].a == x[i+1].a then
	 del(x,i)
	end
end

Still no luck! This method gives me a runtime error:

if x[i].a ==x[i+1].a then
attemp to index field '?' (a nil value)

Any suggestions anyone has on how to do this properly would be much appreciated!

bab_B • 2018-10-28*2018-10-28 18:10*

I think this should do what you want:

function remove_duplicates(outer_table)
  local duplicates={}

  for m=1, #outer_table-1 do
    for n=m+1, #outer_table do
      local duplicate = true
      for key, value in pairs(outer_table[m]) do
        if (value ~= outer_table[n][key]) duplicate = false
      end
      if (duplicate) add(duplicates, outer_table[m])
    end
  end

  for t in all(duplicates) do
    del(outer_table, t)
  end

end

EDIT: in my original post i included a check for whether outer_table[m] and outer_table[n] had the same number of members. this was to avoid the situation where, for example, {a=10, b=12} would be erroneously counted as a duplicate of {a=10, b=12, c=6}. I did this using the # operator on both subtables, but then I remembered that the # operator only counts numerical keys starting from 1 (both of the subtables in the previous example would show a size of 0 if you checked with the # operator and would thus still be counted as duplicates).

If all your subtables are going to have the same number of keys, comparing their sizes isn't necessary. But if the number of keys varies between subtables, you'll need to step through both subtables using pairs() and keep track of their sizes in two variables and then compare them:

for m=1, #outer_table-1 do
    for n=m+1, #outer_table do
      local duplicate, m_size, n_size = true, 0, 0
      for key, value in pairs(outer_table[m]) do
        if (value ~= outer_table[n][key]) duplicate = false
        m_size += 1
      end
      for key, value in pairs(outer_table[n]) do
        n_size += 1
      end
      if (duplicate and m_size == n_size) add(duplicates, outer_table[m])
    end
  end

RaptorBandit • 2018-10-28*2018-10-28 21:53*

Hey thanks, this is perfect! Does exactly what I needed it to do! (And now, looking at your code, I totally understand why the other methods weren't working)

For now my subtables all have the same number of keys but the extra step is good to know for the future.

Thanks again, I really appreciate the help!

[Please log in to post a comment]

About | Contact | Updates | Terms of Use | Picotron

Follow Lexaloffle:

Generated 2025-07-03 15:12:45 | 0.008s | Q:12

User:
Password: