I have a fairly complex database project for which I need to run a similarity search between an imported string and what I have in my database to match the imported document to the entries I have in my database. I am using the Levenshtein custom function written by Steven Allen which does exactly what I need.
The solution is hosted on a Windows 2012 Server that is dedicated only to FileMaker Server 14, with Indexing, Back-ups and Anti-Virus Software pointed away from the database to make they don't interfere like the documentation mentioned. The server has 8 Gb or Ram and the issue happens all the time regardless of the number of user connected or the load on the solution, which by the way is very low all the time. The RAM available for cache is 1500Mb and I am never really getting anything else than 100% cache hit on the statistics page.
The way I have setup my calculation is using an un-indexed calculation field where the Levenshtein function looks for differences between a local variable and a text field in the record. I make a search for whatever record has less than a certain number of differences, and I also look at a related table the same way for another field, both done at the same time using a perform find. I then write down the possible matches in the common field so it is available to clients to look at while the function looks at other entries.
Since the Levenshtein function takes quite a while to resolve, I thought it would be a good idea to split the list of entries I have to match and split it into pieces so I can run the Levenshtein separately on many server side scripts. The server side scripts would then write the UUIDs of the matches onto a common field, a step for which I took the record locking into account by basically writing a loop that tries to write the matches to the field until it gets no error, meaning the set field and the commit step went through.
I have tried many different combinations to find the most efficient way to get the imported entries analysed. However, whenever I get anything more than one server side script running, FMServer generates a .dmp file in the log folder for each entry it tries to analyse. The database stays up however and I don't get any error message whatsoever. I realized the dump file thing only a few days later.
When I first tried it I had something like 5 or 6 scripts running in parallel, each doing their part of the task. However, some of the scripts would never finish and I would get an admin console lock up. I thought I was maybe asking a little much to the server and tried it with fewer and fewer instances of the script running in parallel until I got all the way down to two, where the server keeps doing the dump file thing...
When I get into the DMP file and look at the content I see an exception (C0000005) listed against Support.DLL, which seems to be a module from the FMServer folder. I never get any other error or DMP file for whatever else I do with the database, even whatever is related to the same set of data onto which the error is produced.
What I was wondering is if the error can come from the Levenshtein function building arrays so big to analyse similarities that the database runs out of room for it? From what I understand, the function is complex and has to build super big arrays to find out what the differences are. Since it is so complex, could I be running out of resources, or the problem could be something else?
Thanks for your help!