Take a look at this article and perhaps it will explain:
if you use Export Field Contents you don't get what you have in the field. There are several work-arounds in the article. An alternate is to use a plug-in to preserve what you have. Because you want XML, you might try the XSLT to preserve as you want it.
FileMaker's export as tab-separated will generate UTF-8 without the "byte order mark" (BOM). This provides maximum compatibility with the ASCII file format.
It is rarely needed for UTF-8.
It looks like the file you are trying to replicate is UTF-8 with BOM.
"2.6 Encoding Schemes
... Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature. See the “Byte Order Mark” subsection in Section 16.8, Specials, for more information."
'tis true, Tom, but it may munge the return-in-field and tab-in-field and other characters.
I think you are talking about Export Field Contents with a field containing the entire contents of the file for export.
I thought takermax is using Export records with tab-separated format with each line in the output file as a separate record.
1 of 1 people found this helpful
yes Export Field Contents does munge the text. As does any exported with TAB-delimited field(s) that contain(s) a return or tab character.
FileMaker Pro exports plain text. (The exported file does not include font and style information.) The tab character separates fields, the carriage return character separates records. Most applications can use this file format.
Tabs in fields are converted to spaces.
Carriage return characters in a field export as vertical tab characters.
Values in repeating fields are separated by the group separator character.
I don't think the lack of a BOM is a problem with UTF-8.
There is a good chance that the program that imports the XML file will be OK with UTF-16 produced by Export Field Contents.
The most likely cause is that your XML encoding has some errors.
Try running the XML that you create through an XML validator (this could be an online service or a standalone app). It will let you know if the XML is well formed and, if you provide one, if it validates against a schema file.
If the XML is fairly short, you can try to spot problems yourself by using an XML-aware text editor with syntax highlighting and the ability to expand/collapse nested tags.
good answer! You can often open an XML file in a browser and it will tell you if there is a problem!
The result looks UTF-16, so OP tackermaxmay be trying to "Export Records" but doing "Export Field Contents" as mistake.
In the octal dump, I don't think that the result looks UTF-16.
The guide on the left-hand side shows that 16 bytes are presented in each line.
I count 16 characters on the first line.
UTF-16 of the same content would have only 8 characters using the 16 bytes.
The octal dump of his FileMaker-generated file looks like ASCII (or UTF-8) to me with one byte per character.
"Is the UTF-8 encoding out of a filemaker export actually doing anything on tab separated?"
Yes, it is doing something on tab separated.
Characters with encodings higher than ASCII 127 and characters that don't have any ASCII representation are properly encoded when exported with tab separated (UTF-8).
For example when I type in a bullet symbol (•) into a FileMaker field and export a record containing that field using export to tab-separated, that character is represented by 3 bytes (e2 80 a2) in the resulting UTF-8 text file. (ASCII only allows one byte per character.)
The same bullet character exported using Export Field Contents which generates a UTF-16 text file is represented by 2 bytes (22 20).
Note that the UTF-16 file generated by FileMaker has the standard first bytes of the UTF-16 BOM (ff fe).
I think that FileMaker stores its text field data internally as UTF-16 (or perhaps the simpler UCS-2) so that most of the time (perhaps all of the time) every character is using exactly two bytes. Export to tab separated is doing the work of converting the mostly two-byte encoding (UTF-16) to a multi-byte encoding (UTF-8). UTF-8 uses up to 4 bytes per character.
Plus, as beverly detailed previously, Export to tab separated also does specific data conversion beyond text encoding to handle symbols that have special meaning to FileMaker (tabs, carriage returns, field repetition separator).
The octal dump has 32 bytes each line, so second line has 0000020.
Thanks, I never thought the header also is octal with lack of my experience (It was clear if there is 5th line)
If there need BOM, try
Char(65535-256) & "yourtext"
But I agree it shouldn't be problem...
I have resolved this issue by using an application called UTFCast to attach the BOM on the fly. Works great!