Are there any limits to the Base64Decode function ? If so, where I can learn more about it?
( a base64 coded pdf isn't always correctly decoded and yes, I know that the Base64Encode adds a line break after every 76 characters )
I think we've become the victim of using the same words to mean different things. When you said you want the "contents" of the file, that makes me think you want the data encoded within the file; but after taking in the rest of the thread as it's developed, I think you want all the bytes that make up the raw file, which is a wholly different thing.
This is a non-idiomatic use of the Base64Decode function, but understandable in light of a very odd behavior change FileMaker introduced in version 14. In general, the result of the Base64Decode function is container data — fully rendered files — not text. However, in 14, the Base64Encode function started accepting text as an input, in addition to container data; so the inverse function, Base64Decode, had to be able to output text, too, which gives you an accidental means to get the bytes of the raw file (not merely the contents encoded within the file) interpreted as text. It seems to be failing for you after some limit, but the same set-up you specify works fine testing on my computer (OS X 10.10.5), which leads me to believe it's a limitation of Windows rather than FileMaker.
If you want the raw file (not the contents of the file) rendered as text, and if the Base64Decode function isn't giving you that, you can just write a script to parse the base 64 into text for yourself. It's a very straightforward process, but it will be significantly slower than the Base64Decode function. To get the best performance, you may want to write a routine that:
*By splitting a long string into roughly Sqrt ( $length ) number of chunks (which will be about that same length), the time it takes to walk through the whole string is reduced from O(n^2) to O(n^1.5). (The time it takes Middle to pull a sub-string out is proportional to how far into the source string it is; then for parsing a whole string, multiply that by the number of characters in the string. Theoretically, asymptotically better linearithmic O(n*log n) performance should be possible by breaking the source text down more times like a binary tree or B tree, but in practice it takes a very long string to parse before the performance is actually better due to the extra overhead required by such an algorithm to keep its place.) I wrote Length ( $base64 ) / 3 instead of Length ( $base64 ) because it's more efficient to parse 3 characters of base 64 into 2 bytes at a time than it is to walk through character-by-character, so I'm presuming the parsing script would take that approach.
The first place to read up on it is the help documentation. The line breaks (CRLF, to be precise) added by Base64Encode have no effect on Base64Decode. The big caveat I've found with Base64Decode after updating to 14 is that the function really needs the fileNameWithExtension parameter to know the right thing to do with the data you're giving it.
What aspects of the PDFs you're working with don't translate correctly? Do the PDFs being en/decoded wrong have any features in common that none of the correctly en/decoded PDFs have?
Thanks for your concern, Jeremy. Please try a calculated field, result text type:
Base64Decode ( Base64Encode ( Table :: Container ) )
where the container contains the attached pdf file.
The result is "?", but isn't so if the calc is based on only some encoded values. Try:
Base64Decode ( LeftValues ( Base64Encode ( Table::Container ) ; 5000 ) )
This is why I was thinking of some limit.
Generally decoded value is binary, so you can't see it as calculated result text, only set to container field.
Without supplying a file name (with extension) to the Base64Decode function, a "?" for the result is exactly what I've come to expect. I'd also expect a "?" result when I try to put container data (the output of Base64Decode) in a text-type calculation. What do you get when you try a calculated field with result type container, and include the file name:
Base64Decode ( Base64Encode ( Table::container ) ; "file.pdf" )
@user19752I don't think so ; the decoded result of an encoded generic file returns the file contents ( as opposed to content: you could easily try )
@jbante"What do you get when you try a calculated field with result type container, and include the file name"
Tried that with the data viewer ( I need a .txt result file, not a .pdf !), no error but I couldn't find the generated file, neither I need it.( Win 7 )Please try the calculations I proposed in my previous post.
Are you trying to get the text out of the PDF? Base 64 en/decoding alone can't do that for you (converting file types would require manipulating the encoded base 64 before decoding it again), and the text data may not even actually be in the PDF at all if it's an image-only PDF. (Though it might be an interesting exercise to try setting a container field with Base64Decode ( Base64Encode ( Table::pdfFile ) ; "file.txt" ).) With enough understanding of the PDF format, you might be able to parse out any text in the PDF from the base 64, but using the 360Works Scribe plug-in is much less work.
"Though it might be an interesting exercise to try setting a container field with Base64Decode ( Base64Encode ( Table::pdfFile ) ; "file.txt" ).)"
I have done that exercise, by setting directly a text field ( FileMaker Custom Function:pdfPageCount ( container ; result ) ).
And I'm not trying to get the text, but the contents of the file. Done even that.
But a problem is still there: is there a limit of chars that the decode function can handle ?
P.S.: Scribe isn't free !
Your basic logic is flawed as you do not appear to understand how a PDF is constructed.
It is indeed a dictionary of keys and values. So in some case the text contained in the PDF MIGHT be in plain sight
it is a drawing instruction for a rendering engine to reconstruct the file so you can see/print it.
It is also perfectly legal to jumble EVERY single glyph and its associated positional information into the file, and because you will have NO idea in which order they appear (because of how the dictionary is constructed you will NEVER be able to recovery the text this side of the next major comet. You will have no way of decoding anything that is in, for example a Hindic font this way as the Unicode you might recover needs a translation table to make any sense of it.
Also the content of a PDF may also be a binary stream, containing what you perceive as text. in which case your method will NEVER EVER decode the contents and don't even bother trying to.
To be fair nothing is free. I presume your time costs something.
There are well worn routes which give you complete control over the output you are after, Scribe is one, or my field of expertise is using iText with ScriptMaster to parse not just whole pages but small defined sections of documents, along with all the othe manipulation that become possible once you start this path.
"You will have no way of decoding anything that is in, for example a Hindic font this way as the Unicode you might recover needs a translation table to make any sense of it."
Please, write a pdf with Hindic font... I bet that I can find the pdf number of pages without even open that file.
Then, please, back to the topic
In your calculation
LeftValues() function makes encoded pdf to text string, then decoding it get text result.
But, PDF is not text file, it is binary. There is Char(0) that can't be calculated in text calculation.
Length ( Char(0) ) = 0.
Position ( "abc" & Char(0) & "def" ; Char(0) ; 1 ; 1 ) = 0.
Variable can hold binary container contents, but there is no binary operator in FM.
"But, PDF is not text file, it is binary. There is Char(0) that can't be calculated in text calculation"
That is irrelevant on calculations like this:
Let([string = "ABC" & Char ( 0 ) & "/" & Char ( 0 ) & "DEF" ;p1 = Position ( string ; "/" ; 1 ; 1 ) + 1 ;p2 = Position ( string ; "E" ; 1 ; 1 )];Middle ( string ; p1 ; p2 - p1 ))
It returns: D , exactly like if the string were "ABC/DEF"
In addition the calculation solves the problem of decoding but I would absolutely not use it, I would prefer the more simple:Base64Decode ( Base64Encode (Table::Container ) )which however produces in some cases an error. ( I think due to a limitation of the Base64Decode ( ) function )
Thank you very much, Jeremy.
"I think we've become the victim of using the same words to mean different things."
I thought so too, but see this one:meaning - "file content" vs. "file contents" - English Language & Usage Stack Exchange
"but the same set-up you specify works fine testing on my computer (OS X 10.10.5), which leads me to believe it's a limitation of Windows rather than FileMaker."
Ok, this is the most useful information ! ( along with how I could solve by parsing a part )
Your typo, encoded 4 characters to 3 chars.
Base64Encode ( "1¶2" ) ;
Base64Encode ( "1" & Char(13) & "2" )
both return 8 chars. CR is converted to CRLF. I don't think this is limitation of Windows. Base64 is defined to encode binary data, so there may be no reason to convert any chars although the source is text.
(additional line break on each 76 bytes in encoded result is defined to use CRLF, in RFC)
If use a text file containing 3 bytes of "1¶2", Base64Encode ( theContainer ) return 4 characters on Windows.
Thanks for catching that. My mistake. For optimal parsing, then, the base 64 should be split into Ceiling ( Sqrt ( Length ( $base64 ) / 4 ) ) chunks, not Ceiling ( Sqrt ( Length ( $base64 ) / 3 ) ).
Retrieving data ...