AnsweredAssumed Answered

Text mining: counting occurrences of every word?

Question asked by edyas on Jun 30, 2018
Latest reply on Jul 4, 2018 by edyas

Hello everyone,


I have a table ("TEXT") storing dozens of thousands of paragraphs of full text (field "paragraph").

I have created another table ("INDEX") which I use to extract every unique word (no duplicate) used in TEXT:paragraph.

So far, so good.


In the next step, I wish to analyse the content of TEXT:paragraph on the basis of INDEX.

More precisely, I need to count the occurrences of every word within every paragraph (eg 5 occurrences of "peace" in paragraph A), in order to compare their relative "weight" in each paragraph. My purpose is to enable users:

1) to consult a list of most used words for every paragraph (and in parent records),

2) to conduct searches based on a word and on the number of occurrences in a paragraph (eg find all paragraphs in which "peace" is mentioned 5 times).


So far, I have come up with the idea of creating an intermediary table ("OCCURRENCES"), which would relate every word used in TEXT:paragraph to every word appearing in the INDEX, and include a "Pattercount" calculation field.


And it works.

But I find that method rather brutal: it takes time and I will end up with millions of records in Table ("OCCURRENCES").


Is there any simpler method to calculate the number of occurrences of any given word (bearing in mind that speed will be an issue for end-users searching through paragraphs)?


Many thanks for your help !