Find similar texts using Levenshtein distance algorithm

Comparing two large lists when one is hand typed and contains typos can be challenging. Example:  IO (Input Output) tags exported from wiring diagrams with Microstation text export need to be compared with an IO list to make sure that all hardwired IO points are shown on the wiring diagrams. Using Compare Spreadsheets, I eliminated all matches, but there was still a massive number of mismatched tags remaining. I concluded that the IO list itself was correct, but the exported tags created by the designer contained typos. To resolve this issue, I developed a script which compares “each with each” and tests it with the Levenshtein distance algorithm. Within this script, every tag meeting the distance criteria is listed in consecutive columns for further analysis.

Table with found similar texts