Here is a discussion we started in San Diego with some of the FBA's. We restart it after
having faced problems on one of our servers doing hosting.
It is mainly addressed to those of you doing "large hosting" in order to
share our best practices about :
- Investigating the source of a problem when the server fails
- Maintaining your servers in good health
To start with this discussion, here is the situation we had.
Description of the server :
Mac X Serve with 2x 2,8GHz Quadcore Intel Xeon
Two 300Go SAS DDHD (in redundancy RAID mode)
3 partitions on this disk system :
- System (holding the programs and the OS) => 75Gb free
- Data (well named, it holds the FM databases) => 73Gb free
- Backup (well named too) => 3Gb free
MacOSX Server 10.6.8
FileMaker Server Advanced 11.0.4
2x1Gb RAM (which was the main bottle neck of our configuration) -> Upgraded
Load of the server :
110 databases (100Mb each)
70 simultaneous FMPro Users (usually)
Some CWP (around 40) requests every 5 minutes
Observed problem :
The server had been working without a problem for months.
Regularly the "adminserver" service had to be restarted in order to be able
to use the Console ("fmsadmin restart adminserver"), but this is a "known
stack issue" on FileMaker Server 11 which is observed on many servers we
met. Except this problem, we had no other issue with the machine.
The load had not changed significantly between few months ago and the time
we started to get failures. It started by WPE falling once every two days.
We had to manually restart the WPE. Later, we had then massive failures on
the server (one on the first week, and then it increased to 2 crashes in the
We knew for a while that we were short on RAM (2Gb), we went up to 10Gb. It
gave a second breath to this server and prevented it (until now) to fail
But here comes the main concern, since the load of the server did not grow
in the timeframe : we wonder what made the server behave correctly for
months and then degrade slowly.
Here are the questions we would like to address in this discussion :
1) What do you monitor on your servers ?
What should one monitor/investigate to have a clue about what happened
between few months ago and the day we face troubles. (I repeat that in our
case we did not have a significant growth of the databases size, of the
number of users or so.)
2) Do some of you plan a complete reboot of their servers (once a week,
once a month etc) ? Do some of you plan a nightly restart of the WPE (or
WPC) on their servers ? Did you get success doing so ?
3) Do some of you schedule a complete reinstall of their servers ? We
met some hosting companies doing so once a year. Is it a common behavior on
your side ?
Thank you in advance for your participation to this discussion.