When the dmp files are generated, when you look at the client stats in the admin console, are any clients reporting high "wait time" numbers? (High = over 10,000,000 ms).
If that is the case, disconnect those clients but leave the files open and other files connected and give it a few minutes to see if FMS becomes responsive again...
Thanks for the reply.
We just had another crash and yes, there was a client with a high wait time of around 85,000,0000. I disconnected them but the server did not become responsive again I couldn't get a response from the IWP login page, also the Clients list showed duplicate connections for several individual clients where our FMGo users had attempted to reconnect.
Could this be related to container fields? The big numbers in statistics makes me think it might. We are currently using normal containers but I am wondering if it would be worth trying switching them to external.
There was only one client with a wait time over say 10,000,000?
Any sense of what that client was doing at the time?
If there are no correspondingly high disk I/O or Network throughput numbers then the wait time largely comes from the processor. Check what could make the processor busy: find that produce large found sets (explicit user finds or implicit find like going to a layout with portals that show a lot of related data), calculations that need to be resolved over a large number of records,...
I'm assuming that when a crash like this happens, you go back to a backup of the file, right?
Is there any pattern as to the timing of when the DMP files get created that could give you a clue for what activity is happening in the solution that could cause this?
There isn't any high load on the server at the time this happens. Disk, Network or CPU.
CPU rarely gets above 50% of 1 core.
The solution is an education tool for evidencing pupils progress using FileMaker Go and iPad / iPod cameras to record and upload images to our server.
The layouts are generally simple with one or two portals at most. I think the issue occurs when a user is uploading an image from FMGo over the internet as it tends to happen during periods of the day when teachers are busy uploading pictures of children's work (mid morning, mid afternoon).
I should add this is a server based solution - no seperate file on the FMGo clients - so they are connecting from their school WiFi through an educational WAN then out over the web to our servers.
Re. file corruption, we have just rebuilt the whole solution into a new file (with some copying in of Field Definitions, Scripts, Layout objects etc) so we are confident this isn't caused by a corrupt DB. We have also moved to external container storage. Neither of these actions appear to have helped.
DMP files are usually sign of thread locks and those seem to happen most under load with a number of users doing "complex" searches at the same time. Complex searches is very broad but it could be a GTRR firing, or a layout with many portals drawing with a lot of related records (perhaps even sorted), ...
The fact that the overall CPU load does not mean much, it depends on what the thread does that is being executed. Even with a relatively low CPU load you can run into this thread lock
one would think that FileMaker engineers would have taken that kind of scenario in consideration.
Adding more threads is one way of dealing with numerous simultaneous searches.
A complex search should never cause server to crash. Slow down - yes, but not crash.
And that is what's exaclty happening with FileMaker server since version 7.
Is there any way we an get a copy of your Log Folder zipped, to review the logs. Also a general time of when it happened would help. We might be able to narrow it down to when or what is causing it to happen.
To some extent. That does not exonerate us from writing efficient code though... that applies to just about every scripting or coding environment. We should always look at what we are doing. FM does a lot under the hood that we don't have to worry about. But that also means we don't have full control over what it does or tries to anticipate what we want to do. Given that we need to pay extra attention to how we do things to minimize the unanticpated consequences of FM's ease of use.
We had a good week last week with zero crashes, server loads were up to around 30 FMGo Clients and 20 IWP clients concurrently to the suspect database. Then today we have had two crashes, one during the busy period this morning and again at the peak afternoon point - during which loads were around 22 FMGo clients and 4 IWP so nothing spectacular.
I have attached log files and the dump files in a zip, the actual crash times were 11:19am and 14:05pm.
Re. Wim's comment on demanding calls, I know what you mean, but these crashes are coming when most of our users are working gathering evidence on an iPod (so layouts include one portal at most), they tend to show related records (sorted alphabetically, up to maybe 200 records at a time) from tables containing up to 65000 records total.
There's an open demo of the solution here: https://www.learninglogic.co.uk/demo it gives an idea of how it runs on iPod and Web.
I would be interested in what they are entering. Your database is throwing a lot of 500 errors, and then dumping on not having a unique record/field entered.
What script is throwing this error, or what are the users doing that is? I suspect you find that out, and you will resolved the crashes.
Thanks for the reply,
and then dumping on not having a unique record/field entered.
Where are you seeing this?
Also, I am not sure the 500 errors relate to a specific database, as they are coming from the Web Publishing Core eg.
<ip address>:<port> - - "/fmi/iwp/cgi?-authinfo" 500 10356
So is that an HTTP error 500 (internal server error) rather than a Filemaker error 500?
We just had a crash at 8:02am.
Being so early in the school day, there were just two FMGo Clients connected at the time plus one IWP session which had been idle for 20 minutes or more.
The only odd thing I can see in the logs is that the same user was connected twice on seperate devices from the same IP (over WAN).
Analysing the crashdump with WinDbg shows: -
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(8a8.868): Access violation - code c0000005 (first/second chance not available)
....bunch of symbol errors....
000007fe`f94e0c2a 420fb60402 movzx eax,byte ptr [rdx+r8]
EXCEPTION_RECORD: ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 000007fef94e0c2a (DBEngine+0x00000000003c0c2a)
ExceptionCode: c0000005 (Access violation)
Attempt to read from address 0000000000000030
ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.
000007fe`f94e0c2a 420fb60402 movzx eax,byte ptr [rdx+r8]
LAST_CONTROL_TRANSFER: from 000007fef94ddca7 to 000007fef94e0c2a
I guess all that just means it crashed!
It says a little more than that. The code indicates an "access violation". That means that it was trying to access memory that it had no rights to or that simply could not be read. Could be pointing at a faulty RAM module, especially since this did not happen under load.
Make absolutely certain that no other processes are running on that machine too: no virus scanning, no indexing, no file sharing on any of the FMS folders.
I had a similar situation and had made sure that backups were not being done from the live FM data file, but some IT guru decided everything needed scanned and even though I had told them not to do that (and they said they wouldn't), they were scanning live FileMaker data files and this almost always killed the database if records were being written to. So wimdecorete's advice about reverifying that there are no other processes touching the live FM data is very important.
Also, whenever things get weird like this, I usually do a clean install and sometimes that fixes things. Be sure to back up your schedules first.