Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-14462

Spellchecking support for Qt

    XMLWordPrintable

Details

    • Task
    • Resolution: Unresolved
    • Not Evaluated
    • None
    • None
    • Other
    • None

    Description

      DRAFT

      History

      The QSpellChecker project has its roots in the QtSpellCheckingTextEdit Qt Solution, which was released in 2007/8. A significant chunk of the development was done in early 2009. The project was on ice from mid 2009 until now.

      Major Components:

      • The QSpellChecker class.
      • Plugin-based backends.
      • QTextControl integration

      QSpellChecker

      QSpellChecker manages the plugins and provides a spell checking API. QSpellChecker is GUI-less and lives in Qt core. The class does as little language processing as possible and defers that to the plugins. (We don't want to maintain language-specific code.)

      The QSpellChecker API can be split into two groups:

      1) Getting and setting the current language.

      Each plugin reports a list of supported dictionaries, preferably in the form "language_country" (ex. "en_GB"). QSpellChecker aggregates the list from each plugin into a master list by removing duplicates and sorting the list. Not all strings returned by the backends are well-formed, these are some of the exceptions found in the wild:

      • returning plain "en", "fr", etc strings is very common. Which "en" it is can sometimes be empirically determined and hardcoded in the plugin.
      • "nb", "nn" for Norsk Bokmål/Nynorsk
      • NSpellChecker on 10.4 and 10.5 returns "auto" for language auto-detection, but has a specific API for this on 10.6 onwards.

      This means that not all language strings returned by QSpellChecker can be understood by QLocale, which again means that the API can't be purely QLocale-based.

      Setting the language.
      The language setter must meet the following requirements:
      a) Be forgiving when accepting user input.
      "en" should map to one of the "en_" variants.
      If a non-parsable string is encountered ("auto"), try to find an exact match

      b) Select the highest quality spell-checking engine.
      Some engines are better than others, prefer the better ones. Quality is determined by looking at the key, see "Plugin-based backends".

      There is a conflict between these two goals. For example, given "en" do you choose a high-quality "en_UK" or a lower-quality "en"? We'll have to iterate over the solution to this until we get it right.

      Asynchronous Language Loading

      The synchronous API for enumerating and setting the language looks like this:

       
      QStringList availableLanguages(); 
      void setLanguage(const QString &language); 
      

      Both these functions can/will be slow, the typical procedure for enumerating available languages looks like this:
      For each plugin:
      1) Locate and load the Qt plugin
      2) dlopen the spellchecking engine (if using dynamic run-time loading)
      3) locate (and load/parse?) dictionary files.

      Moving backend scanning and dictionary loading out of the startup and gui thread is desirable. Two solutions are possible:

      1) Make the QSpellChecker language loading API asynchronous.
      2) Keep the QSpellChecker API synchronous, implement asynchronous loading outside of QSpellChecker. This is the current solution

      1) Adding an asynchronous API:

       
      void findAvailableLanguages() // starts async lookup
      signal: void languagesFound(const QString &) // called once after all plugins have ben queried 
      
      void setLanguage(const QString &) // starts async plugin load
      signal: void languageSet(const QString &) // called once the spellchecker is ready. 
      
      SpellCheckerState state() // "Initializing", "Ready" 
      

      It is the users's responsibility not to call isCorrect() etc until the spellchecker is ready. isCorrect() will block until the spellchecker enters the Ready state.

      2) Implement outside of QSpellChecker
      TODO - This will be investigated as a part of the QTextEdit integration.

      QSpellChecker initialization

      QSpellChecker construction must be lightweight:

       
      QSpellChecker *spellChecker = new QSpellChecker(parent); // no loading done here 
      

      It is also desirable if the following code works:

       
      QSpellChecker spellChecker; 
      spellChecker.isCorrect("Ikke øl i en sådan stund, gi mig fløyten."); 
      

      Some observations:

      • No language has been set, use the default on based on QLocale.
      • On OS X use auto language detection.
      • calling isCorrect() triggers the loading of the backend.

      2) Spellchecking

      QSpellChecker supports spell checking words, sentences and paragraphs using a QString-based API.

      QStringRef vs Token API
      isCorrect can return the errors as either a list of QStringRefs or a list of

      {pos, length}

      Tokens.
      1) Use QStringRef. No new classes needed, but is unsafe since it uses an unguarded QString pointer.
      2) Add a token class. Current solution Safe solution, but requires adding a Token class.

      Failures:
      What can go wrong?

      • A backend is not found/can be loaded. This is normal
      • no backends can be loaded.
      • a dictionary is corrupt and can't be used.

      Multithreading support

      Alternatives:
      1) Thread-safe, global mutex lock
      2) Reentrant, per-backend instance lock. Chosen solution

      1) QSpellChecker is thread-safe, meaning you can safely call any function from any thread. The backends can generally be used from one thread at time, so we implement the thread-safety using a global lock.

      This limits the concurrency to one thread at a time.

      2) QSpellChecker is reentrant. Use on QSpellChecker object per thread. Behind the scenes there is a per backend instance lock.

      Recommended use - "The path of good performance":
      Use a single QSpellChecker object from a single thread at a time.

      Caching
      Currently there is no caching, each call to availableLanguages() queries the backends. In other words, availableLanguages() is not designed to be fast - does it have to be?

      Adding caching adds the need for cache invalidation. Will it be necessary to restart the application to detect a newly installed dictionary?

      Plugin-based backends.

      The plugins are located at src/plugins/spellchecker. No spelling engines or dictionaries are shipped with Qt at this point - we locate the available engines at run-time (using dlopen). Some systems don't have a public spellchecking API, the easiest option here is often to install Aspell.

      The plugin key identifies the backend and the quality (actual values):

      nsspellchecker-50
      enchant-100
      aspell-150

      Backend notes:

      Aspell
      Widely installed backend, windows binaries available at http://aspell.net/win32/.

      Enchant
      http://www.abisource.com/projects/enchant/
      Wrapper framework for other spellcheckers:
      • Aspell/Pspell
      • Ispell
      • MySpell/Hunspell
      • Uspell (primarily Yiddish, Hebrew, and Eastern European languages - hosted in AbiWord's CVS under the module "uspell")
      • Hspell (Hebrew)
      • Zemberek (Turkish)
      • Voikko (Finnish)
      • AppleSpell (Mac OSX)

      NSSpellChecker
      Built in spellchecker on OS X. Does not support multiple object instances.

      Future work:

      • Develop a Hunspell plugin where hunspell and dictionaries can be bundled with the application.
      • Develop a Sonnet plugin for Qt applications running on KDE.

      QTextControl integration

      Implementation

      A high-level API is added to QTextControl for enabling and disabling spellchecking and accessing the QSpellChecker object.

      The following classes/components uses QTextControl and can gain spell checking support,

      Implemented:

      QTextEdit
      QPlainTextEdit

      TODO:

      QGraphicsTexItem
      QLabel
      qdeclarativetextedit
      QtWebKit (does not use QTextControl)

      User interface and behavior

      Interaction with the spell checker happens through syntax highlighting and the context menu. There is no spell checking dialog. The intention is to add minimal but sufficient spellchecking support for "casual" text editing. (If you are implementing a word processor you probably want to implement your own spell checking UI.)

      A key challenge here is that users have different expectations on how spell checking should work depending on which platform they are most familiar with.

      Context Menu Additions:

      • Suggestion list
      • add/remove word from dictionary
      • Spellcheck text selection
      • Spellcheck entire document
      • Set language

      "Global" vs "local" spell checking behavior:
      Avoiding the "wall of red spelling errors" effect when loading text or changing the language is important. Two approaches have been observed:

      • Spell check typed text only (as-you-type) Currently Implemented
        Loaded or pasted text is not immediately spellchecked. Moving the cursor over a word causes it to be spellchecked.
      • Spell check entire document with delay.
        A couple of lines are checked every second, starting at the top of the document.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Unassigned Unassigned
              sorvig Morten Sørvig
              Veli-Pekka Heinonen Veli-Pekka Heinonen
              Votes:
              25 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes