User:Wmahan/Spelling

From Wikipedia, the free encyclopedia

Hi. I'm working on fixing common misspellings on Wikipedia, and you may be able to help. My system allows editors to review suggested changes, based on common misspellings found in a database dump. It is a semi-automated process since all changes are reviewed and submitted by humans.

My tool is now hosted by the Wikimedia Toolserver. Thanks to Wikimedia Deutschland for providing the service.

There are similar tools like AutoWikiBrowser and CmdrObot. But this project is unique because it checks the spelling of every word on a page, rather than only correcting from a fixed list of common misspellings. I also hope that more people can help using my tool. You don't need to install any special software, and you can use any platform with a modern visual web browser.

Quick summary[edit]

To get started put

{{subst:js|User:Wmahan/wpspell.js}}

into your monobook.js (or the appropriate page if you are using a different skin). Then refresh your browser's cache and follow the new spelling link in the sidebar.

Requirements[edit]

To use the scripts, you must:

  • have a modern browser that supports JavaScript and cookies, with both enabled
  • have a user account on Wikipedia and be logged in
  • be using the MonoBook skin, the default (I could add support for other skins if someone requests it)
  • be willing to edit your monobook.js to load my script, as described below (knowledge of JavaScript is not required)

Instructions[edit]

Step 1: Create or modify your monobook.js[edit]

Before using the tool, you need to edit the page Special:Mypage/monobook.js. (That's a link to User:YourUserName/monobook.js, for the relevant value of YourUserName.) Note that the m in monobook must be lowercase. Enter the following text:

{{subst:js|User:Wmahan/wpspell.js}}

After you save the page, you'll need to restart your browser or refresh its cache as described at the top of the edit form. Help:User style has a little more information about monobook.js, but it's not crucial that you understand this step.

Step 2: Request an article[edit]

After step 1, you should see a new link on the left-hand side of every page, at the bottom of the toolbox, called Correct spelling. Follow this link, and you will be sent to the edit page an article. Whenever you're finished correcting an article you can use this link to open a new one.

Below the edit form for each article you'll find an automatic correction that has already been attempted. If the automatic change is wrong, you can go on to another article or use the "Reset" button to undo it.

Under the automatic correction you'll find other possible misspellings, which weren't found in my spelling dictionary. Often these are proper names or other words that don't need changing; just skip those. If there is an actual mistake, you have the option of either replacing it with one of the suggestions in a list, or typing in the correction manually.

There are also options to look up possible misspellings in a dictionary; search for more occurrences in Wikipedia; flag a word as a correct spelling (so the tool stops suggesting that you change it); and add a word as a common misspelling along with its suggested replacement. Please only use the last two options for words that are fairly common, since each additional word slows down the process of finding misspellings a little.

Let me know what you think[edit]

Comments and questions are always welcome. Tell me what did and didn't work, and how I can improve the process. I'd like to hear which recent browsers and platforms the scripts work with (only fairly modern browsers have any chance of being compatible, unfortunately).

Privacy[edit]

At present, when you use my scripts, I don't log any information about you beyond the toolserver's Apache logging. However, I might start if it becomes necessary to prevent abuse. If the idea that I could link your user name to your IP address bothers you, I suppose you could use create a throwaway account and use that. If you're concerned about me knowing your IP address at all, you probably shouldn't use this tool.

Users[edit]

Thanks to the following people who have helped correct articles:

Please feel free to add your name to the list.

Notes[edit]

  • The tool's reliance on JavaScript probably makes it inaccessible to visually impaired users, unfortunately.
  • The misspelling list is based on Wikipedia:Lists of common misspellings, but I've removed many words that don't lend themselves to automatic correction and added other common misspellings I've found.
  • It's possible that the scripts will fail to work on your browser. Right now I've only tested with Firefox 1.5 on Linux and Windows, Opera 9 on Linux, and IE 6 on Windows. If you have problems, please leave me a description along with what browser and platform you are using.
  • My database dump becomes out of date quickly, so sometimes the misspelling in your page will already be fixed. If there aren't any changes to make, don't submit the article; just go on to the next one.

Technical details[edit]

I download a Wikipedia database dump, run some perl scripts on it, and upload the resulting database information to the Toolserver. On the server there are some files (PHP, JavaScript, HTML) that take care of the rest. I use mwdumper[1] to read the data dump (previously I used Parse::MediaWikiDump by Triddle). The system is designed to be easy for editors to set up and use, and to do most processing either offline or client-side to minimize the resources used on the server.

When an editor requests a new article, the script pulls an article title from the database and redirects him or her to the appropriate edit page. When displaying that page, the user's browser loads some dynamically-generated JavaScript from my server. That script is created by pulling a list of spelling corrections from the database. The correction interface is subsequently inserted at the bottom of the edit form.

I use the Text::Aspell[2] interface to the GNU Aspell[3] library. I find that Aspell generally does a good job of recognizing regional variations in spelling.

All my scripts are available under the GPL, version 2 or later.

See also[edit]

Similar projects[edit]

(add others)

External link[edit]

  • My page at the Toolserver with the source code and status report