I hope that some of us who are interested in this issue can share our experiences here. I took the time to write this text because I would like to hear if we are the only ones who had to tweak larger solutions and had to be able to follow the results during real time use.
- To make sure that large systems used by many users (+50) are performing well.
Issues causing performance problems
- Less than optimal FileMaker development (using unindexed fields, procedures that work well in test environments but becomes slow in production or when data amounts rise etc. etc.)
- Server architecture/FileMaker Server 11 architecture. Virtual servers that should perform well, 64 bit environments not performing well with FileMaker, I/O bottlenecks, Network problems etc. etc.
For a year we have used our own very very simple performance test at some of our larger customer sites.When a solution is performing slow for the users, one way of knowing is when they tell you!But we needed objective answers (and warnings).
How do we measure the performance of a FileMaker Server and isolate the bottlenecks. We did of course use the FMS statistics a lot. But still sometimes when the statistic results was not terribly bad we could see that the UI was slow for the users and at other times when the statistic result implicated bad performance the users actually got pretty acceptable response times.
Our first solution
Since the FileMaker Server Statistic Page did not show exactly when the users faced delays we needed another test - another reason was that we could not easily program the server to send an alarm sms/email when the performance decreased.
We designed an independent robot function testing a set of routines on the server. We used a copy of the userinterface of the solution and made a script that simulated user-interaction. Then we created a user interface with reporting and graphs and a robot function sending alarms when the solution had performance problems.
Our second solution
We wondered: Do you need to do the test-simulation with script based on data from the actual solution and the UI of that solution?
We created a separate set of performance files and tests and then ran them beside the first performance-test. Our test showed that there was no difference when testing on a special performance file on the server or on data from the real solution. The testresults and slow downs where the same whether we used the generic solution or the files used by the users.
Conclusion: We could create a generic solution for performance measurement of different servers.
The only tiny item we are using from the running solution is a field in the data file get(currentusercount) that we use to display the number or users in our test graphs - since we could not find a way to ask the server for the actual usercount independent of a specific file.
The graph from one of the solutions we are monitoring
For each solution we are setting a tresshold and if this is crossed an email or sms is sent to us or the sysadm.
In this case the peak at 3-5 in the morning is caused by a lot of housholding scripts by server or robot, integration with other solutions and setting of stored values for reports.
This particular graph is from a relatively large solution. Between 10 am and 4 pm apx 70 users will use it heavely.
We started testing it because the solution sometimes slowed down during the daytime, waisting productive time for the users.
After implementing the performance test we began ironing out small and large performance hogs and after a while the graph showed that there are not no delays during the daytime.
Reading the graph: On this server test times between 2 and 6 seconds is showing that the server is answering each call from the user at once.
A large report taking 30 seconds to generate will still take the time it need and a short procedure finishing in apx 1/10 of a second wil finish this fast. But if the test time increases to 15 seconds the 30 seconds will turn into apx. 38 seconds (not very problematic), but the 1/10 second procedure will have a duration of apx 8,1 seconds (unbearable for the user).
Most users start working between 8 and 10 - you can see them checkin inn on the graph and that the solution keep performing well.
What are we using it for?
We are using it for:
- Measuring the fundamental performance of a specific server/client/network setup.
- Checking that the system om the server does not have problematic parts causing owerload.
- Checking that improving/tweaking of the solution actually helps (comparing results in the log before/after)
- Alarming us if a solution begins to underperform
After trimming and modifying this little testsolution ower a year it is now very unobtrusive. It has to be "heavy" enough to check that the server has enough reserves to deliver to the users without delay - while not adding so much to the burden that it becomes part of a potential problem.
Important: This test can really be used to test how a soluton will work with users without the users. But when we install this test solution on a server the measurement show whether the server is still responding at once while the customers FileMaker solution is in full use. And if it does not we can use it to keep measuring while improving the solution or while tweaking and adding ressourses to the server.
- Can anybody give us clues to use the FMS statistic better - where are the best articles on how to interpret and use the statistics?
- Can the FMS statistic be used to evaluate actual user performance and send alarms when the server is to slow?
- Can you share you view with us: Is it also your experience that you can actually measure the server-performance and the performance of your customer solution with a generic set of files that can also be used on different servers?
- Has anyone made a general server performance test solution for FileMaker?
ps. As it is obvious from the text, my native language is not English. I would like to remove the worst grammar/spelling errors. Please email me if you want to help me improving this little discussion-teaser.