1 2 Previous Next 16 Replies Latest reply on Oct 31, 2012 10:14 AM by jbrown

    User-Generated Content & A Curse-word list

    jbrown

      Hey all,

      Has anyone created part of their system that allows users to enter comments/info in while keeping it clean? A friend and I were talking about this and he said his business generated a curse-word list and used it against the content generated.

      How would this work in FM?

       

      I guess the field in which people are typing could have a script tied to it that checks via a patterncount () function to see if any words in the CurseWord Table are present. I guess it could substitute that found work for random characters. Or it could chide the user for using such language.

       

      I'm interested in trying it out. I'd love to hear what others have done with this topic.

        • 1. Re: User-Generated Content & A Curse-word list
          mikebeargie

          http://www.briandunning.com/cf/101

           

          You can grab a curse list of words from here:

          http://www.bannedwordlist.com/

           

          Combine the two and mod the function a bit to store your list, and you can have a custom function that will replace any word from the banned word list with something of your choosing.

          1 of 1 people found this helpful
          • 2. Re: User-Generated Content & A Curse-word list
            comment

            The problem with PatternCount() is that it looks for a string, not necessarily a word. Thus you may find yourself unwittingly censoring words like "rebuttal". Using  Substitute() is even more problematic, because it's case-sensitive.

             

            Perhaps something like this could suit you:

            http://www.briandunning.com/cf/1390

            1 of 1 people found this helpful
            • 3. Re: User-Generated Content & A Curse-word list
              jbrown

              Thanks Mike and Michael.

              I hadn't known that about the restrictions on PatternCount() or Subsitute(). I am beginning to scratch using those. I need to read my Developers Reference for FM 12 more carefully.

              I'll look into it. I hope the teachers in my case are professional enough to use only appropriate words in their remarks. The thing I'm going for is not public for students in any way. In my solution i have an inspirational message that pops up everytime they log in, randomally selected from the MessagesTable. I'd like people to be able to add sayings to their teacher-peers. Working the long hours we do things can get inappropriate there. But since this is a system for 100 people and not just one school, things should be kept above board.

              • 4. Re: User-Generated Content & A Curse-word list
                comment

                Jeremy Brown wrote:


                I'd like people to be able to add sayings to their teacher-peers.

                 

                How about adding them tentatively, until a human reviewer approves them? I am sure I could say some very inappropriate things, using only "approved" words.

                • 5. Re: User-Generated Content & A Curse-word list
                  jbrown

                  That's true. The request will just come to me and I'll mark it to be allowed in the random-generation possibilities.

                  I was just interested in the idea of bleeping out certain words in what users type in. I'll play with that, on my own time.

                  • 6. Re: User-Generated Content & A Curse-word list
                    datastride

                    Jeremy,

                     

                    You can easily make PatternCount() or Position() or Substitute() look for "words" with a little clever scripting. You can first write a tiny little loop that examines your string character by character, substituting a space for anything other than a letter and making all letters lowercase. (Hint: See the Middle(), Char(), Code(), and Lower() functions.). Then you can write something like this:

                     

                    PatternCount( " " & $cleanedUpStringToSearch & " " , " " & $searchWord & " " )

                     

                    Do it all the time ...Works like a charm!

                    • 7. Re: User-Generated Content & A Curse-word list
                      jbrown

                      Thanks Morgan. I'll try thatt out as well. It's one of those It-would-be-cool-if-I-could-incoporate-this features, but it doesn't have to be in there

                      thanks

                      • 8. Re: User-Generated Content & A Curse-word list
                        comment

                        Morgan Jones wrote:

                         

                        substituting a space for anything other than a letter and making all letters lowercase.

                         

                        I presume you mean "substituting all punctuation characters with a space". With Unicode, that could be a very long list of characters.

                         

                        Next comes the issue of inflections, since you don't want to find yourself striking out !@#$ but permitting !@#$ing, !@#$ed, !@#$less, !@#$able, etc. (and a very long etc. that would be, too).

                         

                        Finally, there's the question of what will you do with the result, with all punctuation, upper-case characters, prefixes and suffixes removed - and no way to put them back.

                         

                         

                        Censorship is never easy - even when performed by humans...

                        • 9. Re: User-Generated Content & A Curse-word list
                          datastride

                          Michael,

                           

                           

                           

                          No, I didn’t mean “substituting all punctuation”, but I could certainly have provided more detail …

                           

                           

                           

                          I wouldn’t use Substitute() to rid the original string of punctuation. I would use a very short loop that examines the original string character by character, checking to see if a character was a letter (within certain ranges of values as determined by the Code() function), and then either changing the case to lower or substituting a space in the “cleaned-up string”. This “cleaned-up” string would then be used only for locating potentially offensive words.

                           

                           

                           

                          I was suggesting only a means for locating undesired words in a string. Once located in the “cleaned-up string” (and the position therein determined), one could then use the position of a word in the cleaned-up string to do whatever one wished (remove, mask, translate) in the original string, or one could simply remove the entire entry (or flag it for later human review before allowing) if it contained any objectionable words. No need to put anything back together. 

                           

                           

                           

                          And one could easily flag for later review any string containing any “word” (string delimited by certain punctuation characters:  e.g. spaces, periods, commas, colons, quotes, hyphens, or the like) that was composed of other punctuation characters (e.g. number sign, percent sign, dollar sign, etc.) and letters.

                           

                           

                           

                          And while no such solution would be foolproof, a very useful solution could be developed and then tweaked over time to continually improve accuracy.

                           

                           

                           

                          But really, my main objective was to point out to Jeremy that the functions in question (PatternCount, Substitute, and Position) could, indeed, be used to look for “words”. I wasn’t trying to solve Jeremy’s entire problem … Just wanted to introduce some techniques that might ignite his curiosity and lead to further brainstorming, rather than having him write-off these functions as too limited to be useful.

                           

                           

                           

                          Peace, love & brown rice,

                           

                          Morgan Jones

                           

                           

                           

                          FileMaker + Web:  Design, Develop & Deploy

                           

                          Certifications: FileMaker 9, 10, 11 & 12

                           

                          Member: FileMaker Business Alliance

                           

                          One Part Harmony <http://www.onepartharmony.com/>  

                           

                          Austin, Texas • USA

                           

                          512-422-0611

                          • 10. Re: User-Generated Content & A Curse-word list
                            s

                            Hi Jeremy,

                            Would it be enough to simply highlight the inappropriate words?

                            You could use Michael Horak's custom function http://www.briandunning.com/cf/579 (perhaps modified as described here: https://fmdev.filemaker.com/message/67517#67517) to color or highlight the swear words, then display that to the user or to yourself.

                            Not sure what the performance hit would be given that the list of swear words could be quite long.

                            --

                            Steve Moore

                            Cumberland, Maine

                            • 11. Re: User-Generated Content & A Curse-word list
                              comment

                              Morgan Jones wrote:

                               

                              Once located in the “cleaned-up string” (and the position therein determined), one could then use the position of a word in the cleaned-up string to do whatever one wished (remove, mask, translate) in the original string

                               

                              Yeah, well - I thought you said "easily". That doesn't sound easy at all. Of course, you could achieve essentailly the same thing much more simply (and quickly) by examining the original text word-by-word, checking if the word appears in the black list or not. I have already suggested this in my first post.

                              • 12. Re: User-Generated Content & A Curse-word list
                                datastride

                                Michael,

                                 

                                 

                                 

                                One advantage to the method I suggested (which seems pretty easy to me) is that it also allows for handling multi-word phrases that are considered objectionable:

                                 

                                 

                                 

                                     Position( “ “ & $cleanedUpString & “ “ , “ “ & “son of a gun” & “ “ , 1 , 1 )

                                 

                                 

                                 

                                “Easy” is one consideration … Flexibility is another …

                                 

                                 

                                 

                                But as I tried to explain in my earlier note,  my original intent was to suggest how to use certain functions to find “words” so that Jeremy would not dismiss these functions as less useful than they can be in the context of all that can be done with scripting.

                                 

                                 

                                 

                                Peace, love & brown rice,

                                 

                                Morgan Jones

                                 

                                 

                                 

                                FileMaker + Web:  Design, Develop & Deploy

                                 

                                Certifications: FileMaker 9, 10, 11 & 12

                                 

                                Member: FileMaker Business Alliance

                                 

                                One Part Harmony <http://www.onepartharmony.com/>  

                                 

                                Austin, Texas • USA

                                 

                                512-422-0611

                                • 13. Re: User-Generated Content & A Curse-word list
                                  MikeMcErlean

                                  Hi,

                                   

                                  Couldn't you change your original text into a list (Substitute ( theText ; " " ; ¶ ) , then use the filterValues function to compare to your banned words list? The result will be only words you want to remove, so you would require more processing if there is a result. I'd assumed you want to stop if the post is 'clean'.

                                   

                                  Of course, this doesn't  work for objectionable phrases.

                                  • 14. Re: User-Generated Content & A Curse-word list
                                    comment

                                    Morgan Jones wrote:

                                     

                                    my original intent was to suggest how to use certain functions to find “words” so that Jeremy would not dismiss these functions as less useful than they can be

                                     

                                    Well, you have your point and I have mine. Actually, I have two: the minor one is that it's best to use the xWords() functions when dealing with words. My main point is that there is no good solution to this problem.

                                    1 2 Previous Next