1 2 Previous Next 16 Replies Latest reply on Mar 15, 2017 12:29 PM by TSGal

    Data loss on import of CJK Ideographs

    LeeCollins

      Summary

      Data loss on import of CJK Ideographs

      Product

      FileMaker Pro

      Version

      13.0v3

      Operating system version

      Mac OS 10.9.5

      Description of the issue

      While importing text data from a TAB format text file, I found that the characters in the text file were incorrectly matched with existing data. This caused records to be incorrectly merged, hence the loss of data. The data I was attempting to import contained characters in the Unicode COMPATIBILITY IDEOGRAPH range. It appears that FMP decided to equate these with the variants in the main ideographic range. This might be a reasonable thing to do for certain kinds of search, but should not be the default behavior. I tried setting the field storage to Unicode, but that still didn't fix the problem.

      Steps to reproduce the problem

      Create a database with one record containing the character 福 and another Variant 1. Then import a record from a TAB file containing the character U+FA1B in the corresponding field and the string "Variant 2", matching on the field that contains the ideographs.

      Note, just copying the character U+FA1B into this report causes it to be converted to  福, so I'm using the Unicode value to represent the character

      Expected result

      After import, the database should contain 2 records
      福 Variant 1
      U+FA1B Variant 2

      Actual result

      福 Variant 2

      Exact text of any error message(s) that appear

      None

      Workaround

      Haven't found any

        • 1. Re: Data loss on import of CJK Ideographs
          TSGal

          Lee Collins:

          Thank you for your post.

          I am unable to replicate the issue.  This is what I have done:

          1. On a Mac running Mac OS X 10.9.5, I launched TextEdit, and entered the following information:
               U+FA1B<tab>Variant 2

          2. I saved the file as Test.txt

          3. I launched FileMaker Pro 13.0v3, created a file "File.fmp12" with two Text fields: Text1 and Text2.

          4. I created one record and entered "福" into Text1 and "Variant 1" into Text2

          5. I pulled down the File menu and selected Import Records -> File...

          6. I selected Test.txt and clicked Open.

          7. I set the Import Action to "Update matching records in found set", entered a checkmark next to "Add remaining data as new records", set the Source field (which shows U+FA1B) to match Text1, and import into Text2 the value "Variant 2".

          8. I clicked Import, and one new record was added with U+FA1B in Text1 and Variant 2 in Text2.

          Let me know what I'm doing differently than you so I can replicate the issue.

          TSGal
          FileMaker, Inc.

          • 2. Re: Data loss on import of CJK Ideographs
            LeeCollins

            As I noted above, I'm using the string "U+FA1B" to represent the actual character, since the FileMaker bug reporter did not allow me to actually copy the character into the bug report. What you need to do is type the character it self into the File Maker Record. On a Mac, you can do this if you enable the Unicode Hex input keyboard, hold down the option key, then type FA1B, you should see this 福

            • 3. Re: Data loss on import of CJK Ideographs
              LeeCollins

              Note, when I saved the above, the character was again converted to the variant: U+798F. You need to fix your bug reporting software.

              • 4. Re: Data loss on import of CJK Ideographs
                TSGal

                Lee Collins:

                Thank you for the clarification.  In order to test saving these characters, I have entered two characters:

                U+FA1B   福

                U+798F   福

                TSGal
                FileMaker, Inc.

                • 5. Re: Data loss on import of CJK Ideographs
                  TSGal

                  Lee Collins:

                  I can confirm the U+FA1B gets translated to U+798F on the forum.  Since our forum is licensed through Oracle RightNow, I have alerted my manager of the issue.  I'll work next on the original posting.

                  TSGal
                  FileMaker, Inc.

                  • 6. Re: Data loss on import of CJK Ideographs
                    TSGal

                    Lee Collins:

                    In a Text file, I have entered the two characters into a Text file along with Variant 1 and Variant 2.  The two records were imported into the Test.fmp12 file and the two records show the two different characters in both records.

                    I then copied the U+798F character from Variant 2 record, and pasted it into the Variant 1 record.  I then imported the Text file again, this time matching based on the Text 1 field, and the import updated both records so they now display "Variant 2" in both records.

                    I then tried to enter U+798F into a new record, and then U+FA1B into another record, and the values are the same.  U+FA1B also gets translated to U+798F when entering manually.  However, when importing, it works correctly.  Can you confirm?

                    What other steps am I missing?

                    TSGal
                    FileMaker, Inc.

                    • 7. Re: Data loss on import of CJK Ideographs
                      LeeCollins

                      I can produce the problem every time on import.

                       

                       

                       I start with a file DBBeforeImport.fmp12 which contains one record with the character and code for U+798F. Then I import from a text file which contains the character and code for U+FA1B. I use the option ""Update Matching Records in Found Set.", and the result is as in the picture "BeforeAndAfter". If I use the option "Add New Records", I see the same result. If I look closely at the first character in the input file when I am in the import dialogue, it looks like it's already been converted to U+798F.

                       

                       

                       

                       

                       

                       

                      • 8. Re: Data loss on import of CJK Ideographs
                        TSGal

                        Lee Collins:

                        Thanks for the additional information.  I'm seeing something different, but the results are similar.  I enter U+798F into a field and mark it accordingly, and then in my Text file I have U+FA1B, and when I import, it updates the record AND changes the matching character to U+FA1B.  See the screenshots below for the before import and after import.

                        Regardless, I have sent the information to our Development and Testing departments for review.  When I receive any feedback, I will let you know.

                        TSGal
                        FileMaker, Inc.

                        • 9. Re: Data loss on import of CJK Ideographs
                          TSGal

                          Lee Collins:

                          Our Testing department has been able to replicate the issue.  They have sent their findings to Development for further review.

                          TSGal
                          FileMaker, Inc.

                          • 10. Re: Data loss on import of CJK Ideographs
                            philmodjunk

                            Did anyone replicate this on a Windows System?

                            An entry in the Known Bugs List has been linked to this Issue Report. Any Comments/Questions/Suggested Corrections should be posted here or in a new thread. Please do not post such comments to the Known Bugs List thread.

                            • 11. Re: Data loss on import of CJK Ideographs
                              TSGal

                              PhilModJunk:

                              Unknown.  Testing has only confirmed this issue under Mac OS X.  If Windows is later added, I will post here.

                              TSGal
                              FileMaker, Inc.

                              • 12. Re: Data loss on import of CJK Ideographs
                                LeeCollins

                                It looks like this problem may be more widespread than CJK. Looking through our data a FileMaker, I just found that a number of other characters had been converted:

                                GREEK QUESTION MARK becomes SEMICOLON

                                OHM SIGN becomes GREEK CAPITAL LETTER OMEGA

                                If anyone knows of a workaround, I'd like to hear it. Otherwise, we'll be looking for a more reliable database solution.

                                • 13. Re: Data loss on import of CJK Ideographs
                                  LeeCollins

                                  Just found that all of the Indic script characters written with character NUKTA (U+093C, etc.) have been silently decomposed to base letter followed by NUKTA. For example, before import into FMP,  I had the character: क़ DEVANAGARI LETTER QA U+0958, within the database it appears as DEVANAGARI LETTER KA followed by DEVANAGARI SIGN NUKTA.

                                  Decomposition and canonical mappings can be a useful feature, but this should not be the default behavior. There should be some way to turn it on and off.

                                   

                                  • 14. Re: Data loss on import of CJK Ideographs
                                    TSGal

                                    Lee Collins:

                                    Thank you for the additional information.  I have attached your comments to the original report.

                                    TSGal
                                    FileMaker, Inc.

                                    1 2 Previous Next