Help maching strings

andresc4 · October 19, 2018, 6:08am

I have a list of names and lastnames on a CSV file, lets suppose

Michael,Keaton
Matt,Damon
Jim,Carrey

and I need to evaluate from typewriter node to check if someone is on the list… If the name is typed as the list, I got a hit with the Sift Node, but if just one character is different I wont get a Hit.
What is the best way to do a evaluation that returns a percentage of similarity between 2 strings ?

Example “Michael,Keaton” vs “Michael,Keaton” 100%
Example “Michael,Keaton” vs “Michael,Weaton” 96%
Example “Michael,Keaton” vs “Michael” 47%
onlist.v4p (9.8 KB)

motzi · October 19, 2018, 6:49am

one thing you can try is the kNearestNeighbour (String) classifier in the machine learning pack:
for training, give each of the inputs its own class and see what it spits out with the classifier.

internally it uses a string-distance function that, for two different strings, it calculates the steps of change it has to make to get to the other strings.
kNearestNeighbour_String_.v4p (7.7 KB)

and iirc @microdee had some nodes to calculate string-distances as well…

microdee · October 19, 2018, 11:35pm

yup, they are in mp.essentials and they are quite usable although not the best design. I’m planning to have a node where you can select algorithm in the future instead of all of them having separate nodes.

system · October 19, 2019, 11:35pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.