1 2 Previous Next 15 Replies Latest reply on Jul 18, 2012 9:28 AM by tmlutas

    Import Project Gutenberg catalog

    tmlutas

      You can get the updated daily Project Gutenberg catalog in RDF/XML. I would like to be able to import it into Filemaker (I have FM 11 Advanced). It is not importing. I am obviously missing something. I suspect it is a style sheet. How should I proceed?

        • 1. Re: Import Project Gutenberg catalog
          beverly

          You are correct. XML is an open format (that's the "x" = extensible). The structure of the RDF/XML likely needs to be transformed (the "t" of XSLT) into the FMPXMLRESULT grammar that FileMaker uses for import. You may or may not need multiple XSLT depending on the relationships in your database and as a part of the RDF/XML. Make a sample export as XML from your database to compare the structure. There are some sample XSLT in your FM install, but they may not be sufficient for your database schema.

           

          If the feed is in zip format, you'll need to uncompress it before importing with the XSLT.

           

          Beverly

          • 2. Re: Import Project Gutenberg catalog
            tmlutas

            The attached file is the catalog. It is already an xml file. Filemaker won't touch it without a file name change (just at .xml to the end) and chokes on import. Where can I find more information about fmpxmlresult?

            • 3. Re: Import Project Gutenberg catalog
              beverly

              the ".bz2" is a compressed format (siimilar to zip). FileMaker cannot directly uncompress the file. The Import only deals with .xml and you cannot merely change the extension.

               

              You can find more information on IMPORT in these links:

              <http://www.filemaker.com/help/html/import_export.16.17.html>

              <http://www.filemaker.com/help/html/import_export.16.30.html#1029660>

               

              As I pointed out, you can make an EXPORT (as FMPXMLRESULT) from _your_ database and see what the structure looks like. An XSLT can be created to make the fields match the elements in the source XML.

               

              Beverly

              • 4. Re: Import Project Gutenberg catalog
                tmlutas

                I didn't want to burden people with the uncompressed file but I've only been trying to import the uncompressed version. Shall I upload it? If you open it after decompression with a text editor, you do, in fact, get a valid XML file. I tried using an XSL and got a very different error message:

                 

                SAXParseException: expected entity name for reference (Occurred in entity '/private/var/folders/fj/_3y948796_b3ms2wjsk0kzp40000gn/T/FMTEMPFM500497980007.xsl', at line 20…

                 

                The xslt I used was http://www.filemaker.com/ fmpxmlresult

                 

                Obviously this is a newbie question. I'm using a very well known project (Project Gutenberg) in order to get experience in importing RDF definition files written in XML format. The purpose is to attempt to get a useful listing of all the available properties in a large RDF triplet set like dbpedia. 

                • 5. Re: Import Project Gutenberg catalog
                  beverly

                  It's better to upload the XSLT that you are trying to use. The RDF file can be uncompressed by those who download it from your original link. And the feeds are located here. <http://www.gutenberg.org/wiki/Gutenberg:Feeds#The_Project_Gutenberg_Catalog_in_RDF.2FXML_Format>

                   

                  The "http://www.filemaker.com/fmpxmlresult" is not an XSLT. It's a description of the GRAMMAR that FileMaker uses to import. It's also used to define the namespace (to allow the import to know which elements belong to the FileMaker grammar).

                   

                  Also the fields in your database are important to know (if you already have a database). A "well known project" is only half the information. That's the source. You also need a destination XML. The XSLT is a document that is used with the import dialog to TRANSFORM (convert, translate) from the source grammar/schema into the destination grammar/schema.

                   

                  RDF.xml  ==> XSLT ==> FMPXMLRESULT.xml

                   

                  Look at the import xml dialog and you can see where an XSLT can be selected/specified.

                  There will not be a "standard/generic" XSLT already, because we don't know what the destination fields are. One can be created, however.

                   

                  Beverly

                  • 6. Re: Import Project Gutenberg catalog
                    tmlutas

                    I don't know anything about XSLT so this is terra incognito for me. I was expecting the thing to be a bit smarter than it turns out to be.

                     

                    There is no database. I'm trying to build one around this import and expected it to be making fields based on the input file.

                     

                    So how does one create the XSLT? I'm not quite sure what you mean by destination XML. The destination is a filemaker database.

                    • 7. Re: Import Project Gutenberg catalog
                      mbraendle

                      You should first analyze the catalog file and decide which fields you would like to incorporate into the FM database.

                       

                      I see that the catalog.rdf file uses Dublin Core syntax intermixed with some Gutenberg project grammar. Which of the dc terms do you need? For example, the Library of Congress Subject Headings would require a related table, involving a separate import.

                       

                      In addition, there is some implicit syntax hidden. E.g. in the dc:title element, titles may have carriage returns, which indicates one or several sub-titles (actually, I'm not fond of how this was handled in the file). How would you handle these? There are library rules such as AACR2 and other rule sets that describe how to treat these type of things, are you aware of them?

                       

                      So, before just to start running and just playing around, you should:

                       

                      • analyze the structure of the catalog file
                      • think about the data structure of your database

                       

                      After these things are defined, we can talk about the XSL transformation.

                      • 8. Re: Import Project Gutenberg catalog
                        tmlutas

                        The question of which fields to incorporate is simple, all of them.

                         

                        The catalog comes out daily and is run by people who I have no direct relationship to, much less control over. As I understand it, any book could have heretofore unused terms included, though convention should probably keep the number of times this happens very small.

                         

                        As for the carriage returns, it's part of the field. Filemaker should handle multi-line fields. The sub-titles are an assumption, if a good one, that may not actually be true. I don't want to impose my interpretation on the data. I just want to pull it into Filemaker. Interpretation would come in at a later phase in a follow on project. This is just an exercise in how to take reasonably stable, though out of my control RDF and get it into FM. As a very happy side effect, you get a nice front end that can be made into a freely available run-time to get easy access to Project Gutenberg books, which are a world treasure.

                         

                        I would like to have some sort of script, formula, procedure, or pre-processing database that can take an RDF namespace and generate the XSLT for XML using that namespace. This way I would only have to define the namespace once and then be able to consume data using that namespace going forward. That way you won't have to analyze the structure of the catalog file. You'll just note the namespaces used, generate the XSLT and then run your import. Getting the procedure down on just a couple of representative namespace use cases (not an entire namespace, just an item within one) and I think I will be able to take things from there.

                         

                        As close as possible I want the data structure of the FM database to be RDF which is already the structure of the import file.

                        • 9. Re: Import Project Gutenberg catalog
                          mbraendle

                          tmlutas wrote:

                           

                          The question of which fields to incorporate is simple, all of them.

                           

                          (...)

                           

                          As close as possible I want the data structure of the FM database to be RDF which is already the structure of the import file.

                           

                          But the answer is not simple, since you need to know the target data structure of your FM database, which determines the target XML for import. And the target structure of course depends on what you select from the analysis of your input RDF structure.

                           

                          There are two misunderstandings in your previous post:

                           

                          • the namespace tells you nothing about the data structure; it just names your context and is just used to distinguish element x used in context 1 from element with the same name x in context 2. What you require is the RDF Schema of the Gutenberg catalog (and this involves several sub RDF Schemas) on the one hand and the XML Schema of FileMaker (which is the FMPXMLRESULT grammar, as Beverly says, and which is described e.g. in the XML/XSLT CWP guide of FileMaker Server) on the other hand. If you have both, you can choose which elements to map and write an XSL transformation. There are tools such as Altova Mapforce which can do the analysis for you and with which you can map elements and attributes graphically. Mapforce will then write an appropriate XSLT file for you.
                          • RDF ≠ relational data scheme or relational database (FileMaker), or: RDF isn't about fields and tables, it's about subject-predicate-object relationships, with which you can model much more complex objects than you can do with a relational scheme.

                           

                          In your current case of the Gutenberg catalog, that will work with a relational database. In other cases, a relational database is the completely wrong approach to describe an RDF structure and using a triple store is much more appropriate.

                           

                          From a quick glance at the catalog file, you will see that you need two separate imports to fulfill your requirement of importing "all fields".

                          • 10. Re: Import Project Gutenberg catalog
                            tmlutas

                            Since this is a test of a simple case before I move along to something more complicated like DBPedia, what I'm hearing is don't use Filemaker, use a triple store instead. But then I do a bit of googling and find that some triple stores are built on top of relational databases  . What that means, I'm not entirely sure but it seems not very cut and dried after all.

                             

                            If anybody wants to give suggestions on how to handle this via filemaker, I'm all ears but right now, I'm downloading 4store for the moment.

                            • 11. Re: Import Project Gutenberg catalog
                              tmlutas

                              Update: Apparently somebody's done RDF with mysql. I can see how that might work and am swinging back to trying this on filemaker. Does anybody have an opinion on the triples on relational database approach outlined in the link?

                              • 12. Re: Import Project Gutenberg catalog
                                beverly

                                This is similar to EAV entity-attribute-value. http://en.m.wikipedia.org/wiki/Entity–attribute–value_model

                                 

                                Some WordPress modules use this with PHP and MySQL. The last I looked at EAV it was a nightmare to deal with outside the implementation it was in. Your Mileage May Vary.

                                 

                                But if you research more and find a PHP/MySQL option for dealing with RDF, your certainly have something you may be able to with FileMaker.

                                 

                                -- sent from my iPhone4 --

                                Beverly Voth

                                --

                                • 13. Re: Import Project Gutenberg catalog
                                  mbraendle

                                  It's possible to create a similar structure in FM. Either by using the approach you gave in your link, or using 3 or 4 tables, a subject table, an object table (which might be identical with the subject table), and a join table that is being used for the predicates and links subjects and objects. The fourth table may contain definitions of predicate types.

                                   

                                  I've used the latter approach to describe relations between different authors (e.g. author A is PhD student of author B).

                                   

                                  The problem, however, is how to translate a RDF-like query into a query that FM can process and then to assemble all the resulting triples into a meaningful result.

                                   

                                  An example for such a query: "find all objects that have value A and have at the same time a relation X and a relation Y to subjects of any value, and tell me then which subject pairs are interrelated via -X-A-Y-".

                                   

                                  Using pre-FM12 means only, you would be able to solve such a query by creating a specific ERD structure and specific layout for this task only. And for another problem, you again have to create another specific layout.

                                   

                                  ExecuteSQL() might now be of more help.

                                  • 14. Re: Import Project Gutenberg catalog
                                    tmlutas

                                    Not quite the same but competing to take a piece of the relational database pie. Does EAV have an international standard and a standards body maintaining it? From what I can tell, no. Lack of interoperation and transportability seems to follow from implementations of concepts that either do not have a standard or do not follow the standard. Since RDF does have an international standard, I think that we're going to end up with something a little more portable.

                                     

                                    Semantic web implementations are going to increase in importance because people are making their data available in these formats and Filemaker needs some sort of reasonably intelligent and easy bridge if it's going to have meaningful access to that data. Data sets like Wikipedia can be accessed via the related project DBpedia and to a great extent what that project is producing is taking articles and turning them into semantic triples using RDF. Gutenberg is doing the same. The search engines are all either doing it now or scheduled to start doing it.

                                    1 2 Previous Next