Yes. I don't claim to understand why, but in my testing, number fields have better find performance. When FileMaker Go was first released, I started comparing different UUID functions. I posted an update on those results after FileMaker 12 introduced the Get ( UUID ) function. Number fields were still faster when performing finds. (This is why I'm sticking with my numeric UUID functions for now.) The file size was significantly smaller, too, implying more compact field storage, more compact indexing, or both. (This included numeric data containing equivalent information to hexadecimal strings, where the numeric data use more characters to display with base 10.) I don't think I've compared file sizes for each with indexing turned off vs. on, which would be informative. Since my testing was focused on finding good primary key values, I only tested indexed fields. The equivalent information stored in a text field at base 65535 approached the storage performance of numeric data, but not the find performance.
It would make sense to me that a field which can store only numeric characters (0 through to 9) would be much quicker than a text field which would be able to store a very large amount of character types, the whole unicode set, I would imagine!
What I am saying is that FileMaker stores numbers using the same space on disk as any other (low ASCII) character, so "123" takes up exactly the same storage area on disk as "ABC". In Hex, '123' is '313233' and 'ABC' is '414243' - each is three bytes.
So knowing this, why should number operations be faster than any character operations?
Does anyone have any practical test results that can prove this one way or the other?
Best wishes - Alan Stirling, London UK.
I get what you're saying, but surely filemaker would have dsigned the field type to accommodate only a certain range of Hex character though, right? Therefore, this would change the index behaviour?
Looking at filemaker's tech specs:
- Text: Up to 1 billion characters per field per repetition (limited by available memory) with optional text style runs and paragraph runs. Index is based on the first 100 characters of each word or value.
- Number: Support values from 10^-400 up to 10^400 and the negative values of the same range. Index based on the first 400 significant digits. Up to 1 billion characters per field. The first 400 digits are indexed.
There are some obvious differences in the design of these field types, even though the character limits are the same...I suppose this is what makes all the difference??
surely filemaker would have dsigned the field type to accommodate only a certain range of Hex character though, right?
You may have noticed that Filemaker will happily accomodate any type of text in a number field - at least as far as it concerns storing. When it comes to indexing, the story is completely different.
What I am saying is that FileMaker stores numbers using the same space on disk as any other (low ASCII) character, so "123" takes up exactly the same storage area on disk as "ABC".
Perhaps without indexing turned on. With indexing turned on, digits (more even, up to a point) stored in an indexed number field will definitely result in a smaller file size than storing the same digits in an indexed text field.
I believe FM uses different indexing for text and numbers. Numbers are indexed as a single string, being one number. I believe text fields are indexed both at the field and word level, though it's not always clear exactly how these are different.
When setting field indexing in the field definition, you have different options if the field is a number than if it is text. Dates are also different than text. With dates and numbers indexing is either On of Off. With text you can set it to partial or full indexing. I suspect that is why text indexes may be larger.
General curious question about this;
Why is there a difference between text and number for the number of characters/digits indexed?
That is, why first 100 characters for text vs first 400 significant digits for number ?
If, as Alan ascertains, '123' = 'ABC' there shouldn't be the need for the discrepancy in index lengths.
They aren't the same.
Text is indexed as using weighted Unicode so language specific characters are correctly identified. "ABC" in English is not the same in Japanese, or Thai, etc.
Numbers are just number.
FileMaker Pro 11 Advanced
iMac Mid 2011, 3.4 GHz Intel Core i7 16 GB 1333 Mhz DDR3
2,500,000 unsorted Records SSN = 9 Numbers Alpha = SSN with 0 = a; ... 9 = j
Sort test results:
Sort and Time
Set Variable [ $$StartSSN; Value:Get ( CurrentHostTimeStamp ) ]
Sort Records [ Specified Sort Order: Number v Letter Sorts::SSN; descending ] [ Restore; No dialog ]
Pause/Resume Script [ Duration (seconds): 1 ] ExitLoopIf[Get(SortState) = 1]
Set Variable [ $$EndSSN; Value:Get ( CurrentHostTimeStamp ) ]
Set Variable [ $$TimeSSN; Value:$$EndSSN - $$StartSSN ]
Set Variable [ $$StartAlpha; Value:Get ( CurrentHostTimeStamp ) ]
Sort Records [ Specified Sort Order: Number v Letter Sorts::Alpha; ascending ] [ Restore; No dialog ]
Pause/Resume Script [ Duration (seconds): 1 ] ExitLoopIf[Get(SortState) = 1]
Set Variable [ $$EndAlpha; Value:Get ( CurrentHostTimeStamp ) ] Set Variable [ $$TimeAlpha; Value:$$EndAlpha - $$StartAlpha ]
Looks to me that thadata type does not matter that much.
Yes. In my experience FMP (as well as every other database I have ever worked with) can perform quicker with numbers if your choice is using a number field or a text field that will hold numbers. Numbers sort first in ASCII and other code systems, so theoretically they can be found faster. Indexes that are limited to a smaller character set like 0,1,2,3,4,5,6,7,8,9 should also improve the speed. There may be an advantage to using numbers as key field values to avoid situations where two records could have the same value, too. Since relationships in FMP are based on indexes, I suppose a case could be made for the relationship being "faster" as well.
That said, lots of other issues affect the actual and perceived performance of a solution. Hardware, software, how you use relationships, etc. In small databases these will have much more of an impact than your choice of field types.
When I first started out as a developer, I primarily worked on updating or rebuilding solutions that had been created by others. Based on my observations of what the best practices seemed to be, I established a few simple rules for myself, which have served me well over the years. From time to time I have even come across books and software manuals that advised this approach, or some variant of it, as well:
- Use serial numbers for primary keys in relationships, unless there is a compelling need not to.
- With regard to other fields which will only contain numbers, use the number field type for keys, or if you will need to do math on their values, or if you will need to search on them a lot. Otherwise, use text fields.
Of course, there will always be exceptions, but the above rules of thumb are a good place to start. When I find that I don't have a good reason to break them, it's usually an indication that I should stick to those rules.
I hope this helps.
Based on my experience and also from Jon Thather DevCon presentation on "Under The Hood: Pro and Server Performance" I would say number fields are better than text fields for searching and for relationships. Somewhere in Jon's presentation, he summarises it suscintly: "Each digit in a number field = 1/2 byte in index entry whereas each character in text field = 2 bytes in index entry. Larger index = Larger index to read from disk"
Thanks for the info from Thayer. That means a text index on the same number of characters will by 4X as large as a number index. -- Worth knowing.
Just an Unsort may not be a good test. You should probably close and reopen the file between tests. FileMaker may be caching all the record data when doing the first sort, making the second run faster than it would without the cache.