AnsweredAssumed Answered

Google corpus in FileMaker

Question asked by tonyberber on Mar 12, 2011

Title

Google corpus in FileMaker

Post

I'm importing part of the Google corpus (http://googlesystem.blogspot.com/2008/05/using-googles-n-gram-corpus.html) to FM Pro 11. The part I'm working on contains millions of word sequences found on the web.

It is structured as such:

id, w1, w2, w3, freq
1,deep,freeze,makes,56
2,deep,fryer,is,316
3,deep,impact,makes,107

This is how the data are presented, and I cannot change this structure because millions of records are involved.

Each record is an actual sequence; for instance, record 1, 'deep freeze makes' is a sequence found on the web and it occurs 56 times (at the time Google prepared the corpus).

I can search for any particular word in any of the three positions (w1, w2, w3), which returns all sequences that contain that word in that position. The search is really fast!

What I'd like to get is a summary of the words found in a search. For example, if I search for 'deep' in w1 and it returns the results above, then the summary report should like something like this:


search word: deep
position: w1
summary (sorted by frequency):
deep: 3
makes: 2
freeze: 1
fryer: 1
impact: 1
is: 1


The words occurring near the search word cannot be predicted.

Any help appreciated!

Outcomes