I'm in the middle of dedupping some data and thought I'd ask the above question just encase anyone has a reasonable solution.
Basically I'd be matching by company name but as they are both from two different data sources (all free hand text unfortunately) some of them are very different so I can miss them even though they are the same company name.
Was hoping to have a result like this:
DataSource1::CompanyName - "Solve This Ltd"
DataSource2::CompanyName - "Solve This Limited"
Match Rate = 86%
Because Datasource1 has 12 characters + Datasource2 has 16 characters giving 24. But there is a different of 4 characters from the word 'Limited' so missing the 'i,m,i,e'
4 / 28 (*100) = 14.2%
I've still got 15k records to compare so any suggestions would be great,