1 2 Previous Next 21 Replies Latest reply on Dec 2, 2015 12:23 PM by siplus

    Count "#" of each word

    Kultgerd

      Hi,

       

      I need to calculate syllables of a text for each single word. I need the amount of words with 3 ore more syllables (based on german text).

       

      I have created a possibility which changes vowals into "#" in the text, regarding up to 4 vowels in a row. this works nearly 100%, more than I need.

       

      Now I need to have to find out:

      - how many words with 3 or more "#" are exisiting in the text?

      - how many words with more than 6 characters are in the text?

      - how many single syllable words are in the text?

       

      Doeas anyone have an idea?

       

      Cheers

      Gerd

        • 2. Re: Count "#" of each word
          siplus

          I suppose you have 1 text field (myTF) in one record of a table and you created another field, calculated, which is defined as Substitute (myTF; ["a";"#"]; ["ä“;"#"]; ["e";#"]....["ü"; "#"]), is that the case ?

          • 3. Re: Count "#" of each word
            Kultgerd

            Hi Siplus,

            you are right. One filed containing the text, another one with the substitutions

            Example:

             

            Original:

            Markant wird die Stretchhose durch den leuchtenden Seitenstreifen in Smokingmanier. Aus innovativem Techno-Stretch, edel in der Optik, angenehm atmungsaktiv und durch den komfortablen Stretch bequem und unkompliziert. Schlanke Trendhose mit Formbund, Madeleine-Knopf, Stecktaschen und aufgesetzten Gesäßtaschen. Eine Smokinghose mit Fashionkompetenz  - eine Hose mit Blickfang-Garantie!

             

            Substituted:

             

            M#rk#nt w#rd d#Str#tchh#sd#rch d#n l#cht#nd#n S#t#nstr#f#n #n Sm#k#ngm#n#r. #s #nn#v#t#v#m T#chn#Str#tch, #d#l #n d#r #pt#k, #ng#n#hm #tm#ngs#kt#v #nd d#rch d#n k#mf#rt#bl#n Str#tch b#q##m #nd #nk#mpl#z#rt. Schl#nkTr#ndh#sm#t F#rmb#nd, M#d#l#nKn#pf, St#ckt#sch#n #nd #fg#s#tzt#n G#s#ßt#sch#n. #nSm#k#ngh#sm#t F#sh##nk#mp#t#nz   #nH#sm#t Bl#ckf#ngG#r#nt#!

             

            Listed:

            M#rk#nt

            w#rd

            d#Str#tchh#sd#rch

            d#n

            l#cht#nd#n

            S#t#nstr#f#n

            #n

            Sm#k#ngm#n#r.

            #s

            #nn#v#t#v#m

            T#chn#Str#tch,

            #d#l

            #n

            d#r

            #pt#k,

            #ng#n#hm

            #tm#ngs#kt#v

            #nd

            d#rch

            d#n

            k#mf#rt#bl#n

            Str#tch

            b#q##m

            #nd

            #nk#mpl#z#rt.

            Schl#nkTr#ndh#sm#t

            F#rmb#nd,

            M#d#l#nKn#pf,

            St#ckt#sch#n

            #nd

            #fg#s#tzt#n

            G#s#ßt#sch#n.

            #nSm#k#ngh#sm#t

            F#sh##nk#mp#t#nz

            #nH#sm#t

            Bl#ckf#ngG#r#nt#!

             

            As I have to check different other cases, it is not equal # per each vowel, but this doesn't matter.

            @Mike:

            in general, yes, but I need the amount of words with 3 or more "#". If I know who to create it, I guess, I can also find a solution for the other questions.

            If it is not calculatable, I can solve it with a script. This is not the preferred solution but if nothing else is possible the last chance.

             

            Sorry for my english, I'm not native. German guy...

            cheers Gerd

            • 4. Re: Count "#" of each word
              Mike_Mitchell

              You're going to need recursive functionality whether you do it with a script or a Custom Function. Reason: You have to loop over your list and increment a counter based on the number of "#" symbols on each line. So if you really need it to be a calculation (I'd actually recommend against that, for performance reasons), you could build a Custom Function that could do it and embed that into a calculation somewhere (like a field?).

               

              For example, it might look something like this (totally untested, off the top of my head, so needs to be verified):

               

              cfWordCountWithThree ( theList ; symbol ; count )

               

              Case (

               

              ValueCount ( theList ) < 1 ; count ;

               

              PatternCount ( LeftValues ( theList ; 1 ) ; symbol ) > 2 ;

              cfWordCountWithThree ( RightValues ( theList ; ValueCount ( theList ) - 1 ) ; count + 1 ) ;

               

              cfWordCountWithThree ( RightValues ( theList ; ValueCount ( theList ) - 1 ) ; count )

               

              )

               

              This can be cleaned up some for performance (using Let), but it should get you started. When you call the function, it should look something like this:

               

              cfWordCountWithThree ( {field with list} ; "#" ; 0 )

               

              HTH

               

              Mike

               

              P. S. Don't worry about your English. It's way better than my German.   

              • 5. Re: Count "#" of each word
                siplus

                Can we suppose that you won't divide words with length < 5 ?

                 

                Like

                 

                "Im Fall das ein Stau ist, oder es um ein Stau geht, dann hört man auf mit dem Fahr" ==> keine Trennungen ?

                 

                und auch

                 

                when the last letter of a word is a vocal, don't substitute it with a "#" ?

                 

                 

                (building an example, that's why I'm asking)

                • 6. Re: Count "#" of each word
                  Kultgerd

                  It's not a question of dividing. I need this to develope an SEO text check tool just like Flesch-Reading-Ease. I want to use an additional solution which is called "Wiener Sachtextformel". The Flesch based formula as stand alone value is not as good as combinations of other calculation procedures. I'd like to create a combination of 2 or 3 values.

                   

                  The results are values between 1 and 100 which explain the reading ease. I will integrate it in our text creation and translation tool as a feature for the SEO part.

                   

                  Here are the formulas I am using:

                   \mathrm {FRE} = 206{,}835 - (1{,}015 \cdot \mathrm {ASL}) - (84{,}6 \cdot \mathrm {ASW}) \,

                   \mathrm {FRE}_{\mathrm deutsch} = 180 - \mathrm {ASL} - (58{,}5 \cdot \mathrm {ASW})

                   \mathrm {WSTF}_1 = 0{,}1935 \cdot \mathrm {MS} + 0{,}1672 \cdot \mathrm {SL} + 0{,}1297 \cdot \mathrm {IW} - 0{,}0327 \cdot \mathrm {ES} - 0,875 \,

                  The variables are based on average words per sentences, syllables, and so on... (Wikipedia).

                  I have developed a solution to find out the amount of (german) syllables, which was difficult enough but working good. Now I need to calculate the described values. Some like syllables regarding the "#" version, others the clean text.

                   

                  @Mike

                  I' try it. thx! I need to say, that we still work with FM11 depending on different reasons I can't take influence on.

                  cheers

                  Gerd

                  • 7. Re: Count "#" of each word
                    Mike_Mitchell

                    The Custom Function should work fine in version 11.

                    • 8. Re: Count "#" of each word
                      siplus

                      Well , try out the attached. I think you can freely adjust it to your needs.

                      • 9. Re: Count "#" of each word
                        Kultgerd

                        Hi Mike,

                        if I try to use your cf I get an error that there are not enogh parameters in the function. It appears

                        cfWordCountWithThree ( RightValues ( Zeichen ; ValueCount ( Zeichen ) - 1 ) ; Zähler + 1 ) <==

                        at the last ) in this line.

                         

                        As I can't evaluate it, I won't get it shown in german. In former years it was in english and I was familiar with it. 8 years with german code makes it a bit difficult to understand the function for me.

                         

                        I have subsituted "symbol" with "Zeichen" cause it is not allowed.

                         

                        Gerd

                        • 10. Re: Count "#" of each word
                          Mike_Mitchell

                          Should be:

                           

                          cfWordCountWithThree ( RightValues ( Zeichen ; ValueCount ( Zeichen ) - 1 ) ; "#" ; Zähler + 1 )

                          >

                          • 11. Re: Count "#" of each word
                            Kultgerd

                            Thx, I'll try at home. I have FM12 there and will see, if I can switch it for FM11.

                             

                            regards
                            Gerd

                            • 12. Re: Count "#" of each word
                              Kultgerd

                              Hi Mike,

                               

                              actually I get count + 1 as result. If I put in 4 as count, I get 5 as result and so on...

                               

                              regards

                              gerd

                              • 13. Re: Count "#" of each word
                                Mike_Mitchell

                                Well, I did make some mistakes in the original function. "symbol" and "count" are reserved words, so it wouldn't accept them as parameters. I changed it to:

                                 

                                cfWordCountWithThree ( theList ; delim ; counter )

                                 

                                Case (

                                 

                                ValueCount ( theList ) < 1 ; counter ;

                                 

                                PatternCount ( LeftValues ( theList ; 1 ) ; delim ) > 2 ;

                                cfWordCountWithThree ( RightValues ( theList ; ValueCount ( theList ) - 1 ) ; delim ; counter + 1 ) ;

                                 

                                cfWordCountWithThree ( RightValues ( theList ; ValueCount ( theList ) - 1 ) ; delim ; counter )

                                 

                                )

                                 

                                However, I get an accurate count with that function (20 occurrences in the list you posted). This is what I put in my Data Viewer:

                                 

                                let ( [

                                theList =

                                "M#rk#nt¶w#rd¶d#Str#tchh#sd#rch¶d#n¶l#cht#nd#n¶S#t#nstr#f#n¶#n¶Sm#k#ngm#n#r.¶#s¶#nn#v#t#v#m¶T#chn#Str#tch,¶#d#l¶#n¶d#r¶#pt#k,¶#ng#n#hm¶#tm#ngs#kt#v¶#nd¶d#rch¶d#n¶k#mf#rt#bl#n¶Str#tch¶b#q##m¶#nd¶#nk#mpl#z#rt.¶Schl#nkTr#ndh#sm#t¶F#rmb#nd,¶M#d#l#nKn#pf,¶St#ckt#sch#n¶#nd¶#fg#s#tzt#n¶G#s#ßt#sch#n.¶#nSm#k#ngh#sm#t¶F#sh##nk#mp#t#nz¶#nH#sm#t¶Bl#ckf#ngG#r#nt#!" ;

                                delim = "#" ;

                                counter = 0

                                ] ;

                                 

                                cfWordCountWithThree ( theList ; delim ; counter )

                                 

                                )

                                 

                                What is different here than what you're doing?

                                • 14. Re: Count "#" of each word
                                  Kultgerd

                                  Hi Mike,

                                  I'm doing it this way:

                                  cfWordCountWithThree ( theList ; delim ; counter )

                                  =>

                                  cfWordCountWithThree ( ES_Vorberechnung ; "#" ; 3 )

                                  "ES_Vorberechnung "= theList

                                  The result = 4 (It is always the counter +1)

                                  This is the text which is in the field "ES_Vorberechnung":

                                  M#rk#nt

                                  w#rd

                                  d#Str#tchh#sd#rch

                                  d#n

                                  l#cht#nd#n

                                  S#t#nstr#f#n

                                  #n

                                  Sm#k#ngm#n#r.

                                  #s

                                  #nn#v#t#v#m

                                  T#chn#Str#tch,

                                  #d#l

                                  #n

                                  d#r

                                  #pt#k,

                                  #ng#n#hm

                                  #tm#ngs#kt#v

                                  #nd

                                  d#rch

                                  d#n

                                  k#mf#rt#bl#n

                                  Str#tch

                                  b#q##m

                                  #nd

                                  #nk#mpl#z#rt.

                                  Schl#nkTr#ndh#sm#t

                                  F#rmb#nd,

                                  M#d#l#nKn#pf,

                                  St#ckt#sch#n

                                  #nd

                                  #fg#s#tzt#n

                                  G#s#ßt#sch#n.

                                  #nSm#knghsm#t

                                  F#sh##nk#mp#t#nz

                                  #nH#sm#t

                                  Bl#ckf#ngG#r#nt#

                                  1 2 Previous Next