Just For Fun:  WebDirect with Speech Recognition in FMS 16/17

Document created by steve_ssh on Nov 10, 2018Last modified by steve_ssh on Nov 29, 2018
Version 2Show Document
  • View in full screen mode



This is a revival of an old project originally realized using v.14 of FileMaker Server:


An integration of HTML5 Speech Recognition with WebDirect, using the Google Chrome browser.


Way back then, I posted a YouTube video about it, and recently someone saw that video, and inquired if I would share the details of how it works.


In response, I have created an archive with updated versions of the demo files, as a well as a second video which shows how a basic integration is set up for an arbitrary file hosted with FileMaker WebDirect access.


This document is home to the sample archive.





This is not a production-quality technique or solution. Certainly I think it is innovative integration, and therefore worthy of sharing with the community, especially given our current emphasis on both innovation and integration. I hope folks will have fun with it, and be inspired to do even better integrations that only get more powerful as the platform and the community collectively evolve. Please do not use it in a production solution, but please do enjoy it.



The technique takes place in the Google Chrome browser:


browser view.png



Handling the commands with FileMaker scripting:


There are two simple FileMaker scripts which perform actions based on the spoken command.


The first of these two scripts handles a small set of built-in responses, such as record navigation and basic found set manipulation.


The second script provides a place where a developer can add their own custom solution logic to respond to custom-defined commands:


custom commands script excerpt.png


Personal favorite aspect of this project:


I think that the aspect that most pleases me about this project is the simplicity of the integration steps. The integration within the FileMaker custom app requires pasting in just two scripts, and one layout object. At the server level, the PHP server must be enabled, and a small number of files added to the HTTPS web server root directory. (It does not require the XML API or PHP API.)


The simplicity was largely made possible by product features made available in v.16 of FileMaker, such as cURL and JSON.




For the WebViewer enthusiasts:


The origin of this project was some work I did to experiment with various techniques to provide, within a WebDirect context, parity for the two-way-sans-refresh WebViewer techniques that the FileMaker community has developed within recent years, such as (but not limited to):


- Ryan Simms' FMAjax project/solution

- Geist Interactive's FileMaker WebViewer Bridge project/solution

- A FileMaker Community thread that I authored, pre-dating all of the above (though without the elegance of the hash-change technique that I now associate with Ryan).




Updated video illustrating the integration:


There is a video, which, at 13 minutes, is a little on the long side. Apologies for that -- when I made the video I wanted to be very deliberate about showing the integration steps (and this took a little more time than perhaps needed had I been willing to do 15-20 more takes ).




I hope folks enjoy this, and that it inspires folks to look both at what we can do now, as well as beyond that...


Kind regards,





Added 29 November 2018:


Some implementation trivia, since I noticed this doc is still getting an occasional view:


There were two special circumstances which needed to be accounted for with this project:


1) Being able to handle arbitrary WebViewer reloads due to a WebDirect page refresh.


2) Avoiding the scenario where the browser hears its own speech, and then tries to interpret its speech, i.e. an undesirable feedback loop.


I believe item #1 was handled by assigning a sequence number to each item of text to be spoken, and, if a window refresh caused the WebViewer to receive the same speech command twice, this circumstance would be trapped by detecting a repeated sequence number. When trapped, the redundant directive would be ignored, and the speech would not be repeated.


Item #2 was handled by shutting off the speech detection functionality while speech was been synthesized, and then starting it up back again when the speech had completed. This worked well.


There's a bug/gotcha, though, that I did not catch, and I saw it happening some when I recorded the posted video:


When WebDirect refreshes the page, and the WebViewer reloads, the speech recognition code is automatically started up again. This means that if a full WebDirect page refresh happens while speech is being spoken, that the risk of there being the feedback loop can happen, because the code to handle item #2 is effectively defeated. I believe that this bug could be addressed by making the init code smarter, so that it first checks to see if speech is already happening when it first loads, and, if so, then it would defer starting up the speech recognition code.

8 people found this helpful