Generally a 500 type error is that a connection to the server is not being made or timed out.
Do you see the connection being made in the WPE log on FMS?
I can see the incoming calls from FMGo over http on FMS in the wpe log indeed.
Having said that, if those calls did not go through, the error i would see on FMGo would be in the likes of 1629/1631 - which does sometimes happen but is expected when cellular connectivity goes out the window...
Here we are facing a different beast which - to my limited understanding - seems to relate more to a communication error between FM WebModule/PHP on the one hand and WPE/FM DB on the other hand.
I have had to confirm via various techniques that it was not a authentication issue when passing my user credentials along the way, but i have ruled out this one definitely as the user/pw combination is correct indeed.
Therefore I am now leaning towards timeout between FM WebModule/PHP and WPE/FM DB. That would make sense if this was happening when CPU is up to the roof and RAM fully loaded but this is not systematically the case ; it does also happen for very light calls. It could also be that the call passed to WPE and DB takes too long to process - as it sometines has to do quite a bit in terms of FM scripts - before response comes back to the WebModule/PHP.
Now I am not sure which logs to check - if any to confirm all the above....
As Wim suggests, I would check the WPE log, but I would also suspect the web server log, which might be separate from the WPE.
Could there be an error in the PHP that only gets run in a if/else branch, so that's why it only shows occasionally? If so, that might explain why it seems erratic. Check the httpd logs and see if you can see what the http request is, if it is GET or POST, and replicate the call from a browser to reproduce the error. At least, that's where I've seen this error before.
Looks like we have a winner here:
http log consistently reports the following 4 entries each time the process fails:
[Fri Jul 15 09:12:02.799519 2016] [proxy_ajp:error] [pid 64562] (70007)The timeout specified has expired: AH01030: ajp_ilink_receive() can't receive header
[Fri Jul 15 09:12:02.799643 2016] [proxy_ajp:error] [pid 64562] [client 127.0.0.1:49308] AH00992: ajp_read_header: ajp_ilink_receive failed
[Fri Jul 15 09:12:02.799651 2016] [proxy_ajp:error] [pid 64562] (70007)The timeout specified has expired: [client 127.0.0.1:49308] AH00893: dialog to 127.0.0.1:16021 (127.0.0.1) failed
[Fri Jul 15 09:12:02.799811 2016] [core:error] [pid 64562] [client 127.0.0.1:49308] AH00082: an unknown filter was not added: includes
Digging around shows that increasing the default ProxyTimeout setting in httpd.conf to more than the default 60 seconds could help. However, this not necessarily what we are after: the process goes much faster for each call - couple of seconds - when everything goes smooth. If Apache does not get an answer after 60 seconds , it probably never will and increasing the timeout is not going to help in real terms. Therefore the question is what is Apache waiting upon which is not responding ? The Web Publishing Engine ? FM db itself ?
So, after much thinking, reading, testing, observing (...), the issue was with a simple timeout: wpe is taking long to respond to Apache/PH/WebModule because the script triggered in FMS is taking too long to perform. Eventually run it as PSoS and all is fine...