Hey, if you are here : thankyou let me phrase the problem for you. I have to solve this problem for a work related problem.
Say you have a set of strings, let's call this S
and you have a F -> [0, 1]
that takes in 2 strings (X, Y) and F(X,Y) gives out a real number b/w (0 and 1) about the similarity score on the 2 strings (X, Y).
The F can be 2 things as of now :
1. Edit Distance based : 1 - (edit_distance(X, Y) / max(len(X), len(Y))
2. Vector Search Based cosine similarity : dot(vector(X), vector(Y)) / (norm(vector(X) * norm(vector(Y))
Problem :
You have to create an
optimal
adjacency list where key is some root string fromS
and the values if a list of stringsL
which are most similar toS
.should look something like this :
{"gammoner": ["eammongr", "oammengr"], "cuphea": ["capheu", "cephau", "cepuah"], "furdel": ["furdle", "frudle", "rfudle"], "zanthoxylum": ["zanohtxylum", "zanohtuylxm", "zanohtuylxm"], "preallable": ["plearlable"], "hypsoisotherm": ["hyisopsotherm", "hyisopsotherm"], "nongelatinous": ["nonuelatinogs", "nonuelatinogs", "nonuilatenogs"], "arborous": ["arborous", "arbouors", "rabouors"], "tripartient": ["eripartitnt", "ertpariitnt", "ertpariitnt"], "hiroyuki": ["hiroiuky", "hkroiuiy", "hkuoiriy"], "hartite": ["hartite", "hartiet", "hartiet"], "quadrifoil": ["quadfiroil"], "macmillanite": ["macmilianlte"], "pregeological": ["poegerlogical"], "oakweb": ["oaewkb"], "twister": ["twisetr", "twister"], "osirification": ["osirificaniot", "osirificaniot", "osirificaniot"], "prenuncial": ["prenuncial"], "openmouthedness": ["openeouthmdness", "opeenouthmdness", "opeedouthmnness"], "fluvioterrestrial": ["feuviotlrrestrial"], "hobbesian": ["eobbhsian"], "pharmacosiderite": ["pharmactsiderioe", "pharmactsiredioe"], "goosebird": ["goosiberd", "eoosibgrd"], "befountained": ["beaountfined", "beaoudtfinen", "beaondtfiuen"], "chiffer": ["hciffer", "fcifher", "rcifhef"], "generalizable": ["genezalirable"], "beamwork": ["beamwkro"], "splotch": ["cplotsh", "cptolsh"], "deforciant": ["derofciant", "nerofciadt"], "trochiform": ["trochiform"], "fungiform": ["fugniform", "fungiform"], "mediosilicic": ["medcosiliiic", "medcoiilisic", "medcoiilisic"], "locutor": ["locutro", "lotucro"], "nilpotent": ["nilponett", "nilptneot", "nilpnteot"], "awrist": ["atrisw"], "melicraton": ["relicmaton"], "midships": ["sidmhips", "sihmdips"], "fingerlike": ["fingereikl", "finlereikg"], "redolence": ["redonelce", "rcdonelee"]}
Please suggest some algorithm, the one that I came up with is stuck !