User talk:Beland

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

/Praise /Projects

Feel free to leave a note on this page in the usual manner. I only keep stuff on this page if it requires further action from me, just to keep things tidy, so sometimes I'll just move the conversation to your talk page to keep things in one spot. Feel free to move it back here when you reply, or just use {{ping}} from whatever page it's on.

My old bot Pearle has been offline for a while. My current Wikipedia coding project is Wikipedia:Typo Team/moss. -- Beland (talk) 15:18, 27 July 2018 (UTC)

Wikipedia:Most wanted stubs[edit]

Hi Beland - is it possible to run a bot and see what turns up at Wikipedia:Most wanted stubs as the most wanted stubs to expand? Interested in what it might turn up for potential DYK expansions...Cas Liber (talk · contribs) 03:36, 13 June 2013 (UTC)

Hmm, definitely looks like it's time for an update there. Unfortunately the code I have to do that is old and creaky and I'm not sure it's compatible with the current database dumps. I'll try and take a look at it in the next few weeks. -- Beland (talk) 16:12, 18 June 2013 (UTC)
Hi again - do you still have time or inclination to have a go at this? Or would it be something that someone else could do...? I have no idea as I have no experience in any of this....Cas Liber (talk · contribs) 07:49, 25 September 2013 (UTC)

More refs needed[edit]

For this content [1] Doc James (talk · contribs · email) 05:16, 5 September 2016 (UTC)

@Doc James:: Sure, I ran out of time to continue working on that today. Some of the facts (like prices) were pulled from the linked articles, but pretty much everything could use further fact-checking and referencing. Though there's some overlap with the "Developing world" section and perhaps Prescription drug prices in the United States, which might be helpful to resolve first. Any help you could give would be appreciated. -- Beland (talk) 05:51, 5 September 2016 (UTC)
Was removed - Talk:Prescription_costs#Cleanup - could use some research. -- Beland (talk) 03:08, 6 May 2019 (UTC)

Moving unreffed content - Malaria[edit]

This article is a GA. [2] Not sure why you added unreffed content? Best Doc James (talk · contribs · email) 20:42, 6 September 2016 (UTC)

@Doc James: Hmm, something seems to have gone wrong with that edit, which is clearly not a whitespace change as the edit summary suggests. Looking back at my edit history, I find that I was moving that content from Malaria vaccine where it was off-topic, to an article where it was on-topic. I was actually the one who tagged that chunk as unreferenced while cleaning it up; I have no reason to believe it is false, but it seemed important enough to verify. I hadn't noticed that Malaria is a Good Article, but I probably would have made the same edit even if I had, since it's clearly a better place for that content if it's going to be in the encyclopedia at all. I will see if I can find some time to fact-check those claims, if someone else doesn't beat me to it. -- Beland (talk) 21:16, 6 September 2016 (UTC)
Was removed, could use tracking down: [3] -- Beland (talk) 03:07, 6 May 2019 (UTC)

Geographical list titles[edit]

Note to self re: Talk:Mains electricity by country/Archive 4#Vehicles_and_non-country_places:

-- Beland (talk) 18:35, 23 May 2018 (UTC)

Other followup needed on:

-- Beland (talk) 02:11, 6 May 2019 (UTC)

Typos by occurence[edit]

Hi Beland, I wanted to say that I admire your dedication to correcting typos in Wikipedia.

I'm trying to find ways to make the process of correcting typos easier. moss project is a great project but it works one typo at a time, and it excludes many parts and kinds of typos. I've made a list on my own with typos existing on Wikipedia, ordered by frequency.


To make it easier to check whether a word is a typo, I added short context: User:Uziel302/Typos with context and in older version you can see how it works when I don't focus on popular words.

Any feedback is much appreciated and if you know someone else that might be interested in list of that kind, please let me know.

Thanks, Uziel302 (talk) 13:40, 5 January 2019 (UTC)

@Uziel302: Interesting. I've definitely neglected certain types of mistakes just to simplify my code and get the currently overwhelmingly-large list up and getting fixed. 8) I'm curious how you generated the list of known typos? -- Beland (talk) 01:35, 6 January 2019 (UTC)
Took 10K English words from here, made variations with a simple C program, switching letters, duplicating, removing and replacing with e (can be any other letter, started with the popular one). Then removed from the list of variations anything appearing in titles of Wiktionary. I have a web version here. Will be happy to hear feedback.Uziel302 (talk) 07:17, 6 January 2019 (UTC)
@Uziel302: Ah, interesting, that's tackling the problem from the opposite end of what I do, since I start with all the words in Wikipedia and then try to figure out if they're correct or not. Certainly can't argue with the results; looks like you've got a good list of items waiting to be fixed! I'd probably just include some context on both sides of the typo. I try to do some things to make it as fast as possible for editors to go through list, like making a list to the offending article. There are definitely faster UIs, like interactive web pages, but that's the best I've had time to manufacture so far. Not sure if you're working off a local file or something when you fix typos yourself? -- Beland (talk) 03:16, 7 January 2019 (UTC)
What do you mean working off a local file? I downloaded dumps to my computer if that what you meant. My C program ran over the 65 giga file for some hours, probably didn't finish the whole thing.
When I fix typos myself I do it manually on regular wiki editor and for multiple typos of the same kind I use AWB. This is why I ordered by frequency, it lets me find stuff for AWB. I hoped you could recommend Wikipedians that might help fixing the typos, I saw your project got a lot of dedicated contributors who actually fix the typos found.Uziel302 (talk) 17:27, 7 January 2019 (UTC)
@Uziel302: To find volunteers, I mostly just linked from Wikipedia:Typo Team#Methods for searching and correcting_typos and a few "projects that need help" places, and also started using links in edit summaries. Would you be posting lists you want people to work through, or are you looking for fellow users of AWB? -- Beland (talk) 18:32, 7 January 2019 (UTC)
I can make lists with some context for one by one fixes, I don't think it has real added value comparing to your list. Especially since I don't have direct link to article and I need to search each phrase. If there is downtime where your list has no new typos to fix I can offer a list on my own. I think the main added value of a list by frequency is for AWB users who can fix many typos in a short time.Uziel302 (talk) 22:02, 7 January 2019 (UTC)
Gotcha. I think there is an official list of typos that AWB and related tools automatically try to fix, no? I've not ever tried to add anything to it. -- Beland (talk) 21:36, 9 January 2019 (UTC)
I just wondered why do you go over all the words and check if they exist, when you only show on the list the T1 typos which are close to common words, just search the list of words that are close to common words, isn't it more efficient? Uziel302 (talk) 21:12, 9 January 2019 (UTC)
@Uziel302: Well, to some degree it's because that's the way I started out doing it before I figured out that edit distance was a useful metric to separate actual misspellings from new words. At some point we'll run out of T1 lists; T2 words are mostly misspellings like T1, but T3 aren't, so in a year or two it looks like I'll have to come up with some new tricks to find the remaining misspellings. But in principle all correct spellings should be in Wikitionary, and moss has been a source of a lot of new words for that project. Different lists are better at finding those, like ranking by frequency of occurrence. The "articles with the most typos" list has also been helpful in finding species missing from Wikispecies, and also pages that are full of junk or foreign-language text. It was also interesting that the pattern detector was able to find a bunch of not-standardized rhyme patterns which I've been slowly fixing. -- Beland (talk) 21:36, 9 January 2019 (UTC)
You assume that we correct typos faster that we create them...
It looks like it takes a lot of time to generate the typo lists on your project, I guess you analyze every word in dumps, instead of just checking if it's T1.Uziel302 (talk) 09:41, 10 January 2019 (UTC)

The stats do show the number of T1s dropping. I do indeed analyze more or less every word in the main text of most articles. It takes about 26 hours for the code to run, but dumps only become available twice a month, so I'm not terribly motivated to improve speed. -- Beland (talk) 14:47, 10 January 2019 (UTC)

My project started running here: User:Uziel302/AWB cleanup.
I hope we will finish the list of frequent typos in the near future.
I guess my list of T1 isn't complete, since I only made some kinds of variations, and only on 10K words.
Can you share what defines T1 on moss?
Thanks, Uziel302 (talk) 20:14, 21 January 2019 (UTC)
@Uziel302: Hey, congratulations. I added a link from the Typo Team homepage.
In the moss system, potential typos are triaged by
near_common_word() determines the edit distance between the potential typo and the first spelling suggestion from the enchant spell checking library. The code T + edit distance is assigned if there is a spelling suggestion, so T1 means that edit distance was 1. -- Beland (talk) 00:22, 27 January 2019 (UTC)

Gadget for typo correction[edit]


In order to make typos correction easier, I wrote a gadget in Hebrew Wikipedia that allows users to click and correct typos based on 1 edit distance autocorrect. I plan to write similar gadget here and would like to hear how should I start, where do I get permissions to edit media wiki JavaScript and create a gadget everyone can add. This is my Hebrew script and I upload to the project page the list of typos my C program finds in the dumps. Thanks, Uziel302 (talk) 20:42, 13 April 2019 (UTC)

@Uziel302: I've not gone through that process myself, but maybe you were looking for Wikipedia:Gadget? -- Beland (talk) 04:18, 14 April 2019 (UTC)
Thanks, seems like I need to upload it as user script first.Uziel302 (talk) 04:43, 14 April 2019 (UTC)
Ready: User:Uziel302/typos.js. And after adding it to your commons.js you will see "implement edit" in User:Uziel302/Typos. Just corrected a few typos with it, please let me know if it can benefit MOSS project.Uziel302 (talk) 16:32, 19 April 2019 (UTC)

Wikipedia:Templates with red links[edit]

Can you generate a new set of lists for Wikipedia:Templates with red links, and overwrite the 2011 lists? bd2412 T 14:33, 17 May 2019 (UTC)

Just following up on this. bd2412 T 18:16, 31 May 2019 (UTC)
@BD2412: I've never run such a report before. I think it's RussBot that produces them, and the owner is User:R'n'B? If they're not available, I could write some code to do this, but it might take me a while. -- Beland (talk) 19:29, 31 May 2019 (UTC)
My mistake - you had created the project page back in 2005, so I had incorrectly assumed that you also ran the report. I'll see if I can figure out who did. It may have been R'n'B. bd2412 T 20:23, 2 June 2019 (UTC)
@BD2412: Oh, you're right! I used to run that report a looong time ago, I had completely forgotten. Unfortunately all the Perl code that powered that was obsoleted when the database dump format changed. My new Python dump analysis code isn't set up to track links, but it could be extended. I hope you connect with the already-written bot owner; if not then I'll have some work to do. 8) -- Beland (talk) 15:59, 3 June 2019 (UTC)
I don't know who did the 2011 update, but hopefully someone can provide a new one. bd2412 T 16:11, 3 June 2019 (UTC)

List of low frequency typos you can load on AWB[edit]


Hi Beland, following previous chat about Levenshtein distance 1 typos, I took all common words, made on them all possible variations and removes the legitimate words from the output. I then searched those 200K variations across Wikipedia dumps. What I found helped me create a list of less frequent replacements and a list of the articles where they are found. You can load those lists from Wikipedia:AutoWikiBrowser/Settings/Autocorrect and the talk page and start fixing thousands of obvious typos across Wikipedia, few seconds per fix. I hope you will find this list useful. I also hope it can help MOSS project in some way. Any feedback is much appreciated! Uziel302 (talk) 14:17, 21 July 2019 (UTC)

Note to self[edit]

To finish:

14 years of adminship[edit]

Wikipe-tan mopping.svg
Wishing Beland a very happy adminship anniversary on behalf of the Wikipedia Birthday Committee! Chris Troutman (talk) 17:17, 4 September 2019 (UTC)

Letter B[edit]

Letter B is finally finished on the moss project. It took one day short of a month, and I was thinking, if that's an average page, and you don't add or subtract anything, and nobody else joins or leaves, the entire alphabet will take two years. (And then we can do it again!) You have some competition from "One click" working on the same thing, so maybe shorter cycles might be better because sometimes they've corrected errors first. Maybe the missing leading zeros should be deleted because so many of them are sports statistics and gun/bullet calibers. I can't wait for the letter C!

Ira Ira Leviton (talk) 20:48, 1 January 2020 (UTC)

@Ira Leviton: Cool, the completion B really snuck up on me! I've been trying to keep results fresher by not posting all the sections, which means with current methods and volunteer power we'd be well over a year to do the whole alphabet. But it seems like JWB should be really good at fixing whitespace-around-punctuation problems, and I've been meaning to start producing downloadable config files for those. It should be easy to exclude decimal fractions—thanks for finding the pattern there. (I've skipped the TS+DOT section for C until that's fixed, but wait no longer for the other sections!) It's awesome to have some typo-fixing competition; I'll have to check out what they're doing and maybe de-duplicate against their listings. Thanks again for your many hours on this project, and happy 2020! -- Beland (talk) 23:52, 1 January 2020 (UTC)

FAR notifications template[edit]

Featured article review
Talk notices given

ongoing improvements as of June 2020
improvements underway in June[4]

Find more: never on Main Page,
likely under-referenced

I know you expressed interest in getting FAR moving last month. Here is a template listing FAs (and dates) with talk page notifications that a Featured article review is needed. According to the FAR instructions, after waiting five to seven days to see if anyone engages to address the issues, anyone can bring an article to FAR, subject to a) no more than one nomination every two weeks; and b) no more than four nominations on the page at one time, unless permission for more is given by a FAR coordinator. Regards, SandyGeorgia (Talk) 21:22, 28 January 2020 (UTC)

@SandyGeorgia: Ah, great, I'll make use of that and clean out the backlog from 2006. -- Beland (talk) 00:43, 29 January 2020 (UTC)

A barnstar for you![edit]

Original Barnstar Hires.png The Original Barnstar
For daring to cut through this Gordian knot at Early Christianity-related articles. Highly appreciated! Joshua Jonathan -Let's talk! 06:02, 31 January 2020 (UTC)

A cup of coffee for you![edit]

Cup-o-coffee-simple.svg I responded to your post at Talk:Hospitalized cases in the vaping lung illness outbreak. I appreciate any administrator's attention in vaping. Do what you will, of course.

I see this as a space with a lot of corporate shenanigans and paid editing from an international lobby, the nicotine industry. Their agents and writers never tire because they continually revive through funding.

I affirm that the vaping space is contentious and unusual.

Since you are an admin, I would especially appreciate any big ideas which you could develop for how to respond to perennial tension in spaces where money is no object on wiki or off to get a viewpoint enacted.

Quackguru in general says the things that I wish I could say, if I had time, and if I had the patience to be in this space. I do not see that user as a lone actor, but more like an elected representative who subjects themselves to hostility which I do not want in my own life. Blue Rasberry (talk) 12:04, 6 February 2020 (UTC)

Your draft article, Draft:Stability of democracy[edit]

Hello, Beland. It has been over six months since you last edited the Articles for Creation submission or Draft page you started, "Stability of democracy".

In accordance with our policy that Wikipedia is not for the indefinite hosting of material deemed unsuitable for the encyclopedia mainspace, the draft has been deleted. If you plan on working on it further and you wish to retrieve it, you can request its undeletion by following the instructions at this link. An administrator will, in most cases, restore the submission so you can continue to work on it.

Thanks for your submission to Wikipedia, and happy editing. kingboyk (talk) 14:03, 9 February 2020 (UTC)

A barnstar for you![edit]

Writers Barnstar Hires.png The Writer's Barnstar
For turning a problematic page into an encyclopedia article by removing inappropriate content. Thank you. WhatamIdoing (talk) 06:23, 22 February 2020 (UTC)

A cup of coffee for you![edit]

Cup-o-coffee-simple.svg Thanks for your development of Hospitalized cases in the vaping lung illness outbreak. Blue Rasberry (talk) 12:41, 23 February 2020 (UTC)

Thank you for being one of Wikipedia's top medical contributors![edit]

please help translate this message into your local language via meta
Wiki Project Med Foundation logo.svg The 2019 Cure Award
In 2019 you were one of the top ~300 medical editors across any language of Wikipedia. Thank you from Wiki Project Med for helping bring free, complete, accurate, up-to-date health information to the public. We really appreciate you and the vital work you do! Wiki Project Med Foundation is a thematic organization whose mission is to improve our health content. Consider joining here, there are no associated costs.

Thanks again :-) -- Doc James along with the rest of the team at Wiki Project Med Foundation 18:35, 5 March 2020 (UTC)



We've finished plodding through the D's. can you post the latest dump for "E" on the Moss project? By the wasy, is there a way to prevent including three digit decimals without a leading zero since these are usually baseball statistics, and some of the common gun/bullet calibers, like .45, .38?

Thanks for all your hard work,


Ira Leviton (talk) 17:52, 10 March 2020 (UTC)

@Ira Leviton: Ah, already! Since it's been a while, for fresh results I'll analyze a new dump; should be done in a day or so. -- Beland (talk) 02:56, 11 March 2020 (UTC)
@Ira Leviton: Good idea on the calibers and batting averages; I'm now testing some code to exclude instances of the .## or .### pattern where "caliber" or "batting average" is elsewhere in the wikitext. That should let us continue to search for missing leading zeros in circumstances where the MOS says that's wrong. If it works, I'll post the TS+DOT section for E; in the meantime I've posted other sections so we won't waste time dealing with false alarms. I've been meaning to go back and clean out the "legit or unknown, needs tagging" cases so I don't have to avoid posting T1 sections for letters were there are case notes on those. I may post more complete dumps from earlier letters if I do manage that. Anyway, thanks for your diligent corrections, as usual! -- Beland (talk) 19:27, 16 March 2020 (UTC)

Happy First Edit Day![edit]

Christianity per century - series[edit]

Hi Beland. Thanks again for your attempts to size down the number of articles on Christian history. Looking at the century-series, I notice that there is disproportinal much info on the spread of Christianity, while there is not a separate article on that topic. It might be a good idea to collect that info in one article. Regards, Joshua Jonathan -Let's talk! 08:46, 8 April 2020 (UTC)

@Joshua Jonathan: Hmm, that's an interesting point. There's certainly a lot of material on the spread of Christianity...if we collected that from the ~twenty by-century articles, would that all fit into a single article? If not, the per-century articles might need to retain the most detailed coverage, or I suppose we could make by-continent or by-millennium "spread of Christianity" articles? I'm not sure the amount of coverage on per-century articles is disproportionate, since it seems like one of the most important changes that has happened in a lot of centuries, but maybe some of those articles need more material on other topics added? A broad overview of expansion throughout the history of the religion would also make for tidy and interesting reading, though, and perhaps provide some relief for the long History of Christianity. Bringing content together in a single article from dozens of articles might require a fair amount of work to resolve contradictions or fill in holes, though if either of those exist it would certainly be worthwhile. I'm not sure I can personally commit to doing that given the merges and neutralization I'm already working on. Were you thinking of doing so, or just trying to get a sense if there would be objections? Either way perhaps it would be helpful to start a thread on Wikipedia talk:WikiProject Christianity/Noticeboard and see what other editors think. -- Beland (talk) 00:39, 9 April 2020 (UTC)

Broken instances of lira sign[edit]

I tried your test for broken instances of people using a ₤ swithout a lira association. It looks like it should work but it didn't. Can you recheck it? --John Maynard Friedman (talk) 10:05, 10 May 2020 (UTC)

@John Maynard Friedman: In what way did it not work for you? -- Beland (talk) 15:55, 10 May 2020 (UTC)

conversion of special characters[edit]

Hello there. I noticed your edit to "convert special characters" on the Amstrad CPC character set page introduced three errors into the character set table:

  • U+2019 RIGHT SINGLE QUOTATION MARK was erroneously changed to U+0027 APOSTROPHE
  • U+0060 GRAVE ACCENT was erroneously changed to U+0027 APOSTROPHE
  • U+007C VERTICAL LINE (&#x7c) was erroneously changed to U+002D HYPHEN-MINUS

I could see that in general Wikipedia text the apostrophe is desired but this table reflects what the character set actually uses, so subtituting an apostrophe is incorrect. The change from vertical line to dash is more baffling. I've fixed the page but I'm assuming this edit was partially automated so I wanted to note the problems so they can be addressed. Thank you. DRMcCreedy (talk) 19:26, 24 May 2020 (UTC)

@Drmccreedy: Whoops, thanks for catching those. There are some specific transformations like these that aren't appropriate for character set pages. I've been dealing with them manually, but I must have missed these. I have a bunch more character set pages to do, so maybe I'll make some special code to treat them more safely. -- Beland (talk) 05:59, 25 May 2020 (UTC)

Recent edits[edit]

As you've probably noticed, I've rolled back your latest JWB run. Things like changing Θ to Θ should not be done at all, let alone with automatic assistance. I noticed a couple others mixed in there that were probably okay, but on the whole, this sort of change is utterly pointless and runs afoul of WP:COSMETICBOT. –Deacon Vorbis (carbon • videos) 21:11, 26 May 2020 (UTC)

Beland, I thought we'd had this conversation. There's no agreement about whether and when unusual chars should be inserted literally vs. symbolically (i.e. via &-escapes or templates), but for goddam sure no one should be going around regularizing articles to their preferred choice. EEng 21:19, 26 May 2020 (UTC)
MOS:MARKUP does currently say that HTML markup should be avoided, unless an HTML entity would avoid potential confusion. The reason for making this change is to make it easier for editors who don't know HTML to edit articles with non-ASCII characters, and to make it easier for third parties to parse Wikipedia content.
@EEng: This was certainly discussed on the Manual of Style talk page a long time ago, and you're right that we did not come to agreement about everything. Partly that's because some participants thought this was too picky of an issue to put in the MOS, and partly because some participants wanted such guidelines to emerge in a bottom-up rather than a top-down fashion. The only way for guidelines to emerge in a bottom-up fashion is to edit articles and see if any controversies arise. People had different feelings about different kinds of HTML entities. If I remember correctly, there was general agreement that some things should be converted, like numbered entities (unless there are technical reasons not to, or potential confusion with a different character), Latin letters with accents, and Greek letters inside Greek words. As I've been cleaning up articles, I've fixed a bunch that are simply malfunctioning, and I've also discovered some ranges of codepoints that should never be converted, and enlarged the list of problematic characters in the de facto guidelines. Where editors have objected to not using HTML entities, they usually have a good reason, and we've so far managed to come to local consensus. I haven't noticed anyone going around changing Unicode characters or templates back into HTML entities, and the Unicode way of doing things is generally much more frequently encountered. (Occasionally I use templates for problematic cases like {{okina}} vs. {{asper}}.)
@Deacon Vorbis: It sounds like you have a particular rationale for keeping &Theta;? Is that because it might be confused with &theta; - Θ vs. θ? I've noticed that a lot of the pages that use Greek letters outside of Greek words use <math> markup, where something like \Theta can be used for this purpose: In some cases where there is potential confusion, I have converted HTML entities to use math markup, which at least relieves editors from the requirement to know both the math markup language and HTML entity markup, in addition to Unicode character entry, which is getting a bit hard to use. -- Beland (talk) 21:42, 26 May 2020 (UTC)
I was just using theta as an example; the same goes for just about any of the ones I saw. If there is fully spelled out Greek text using HTML entities, then I doubt anyone's going to have a problem with changing to their literal unicode equivalents, but for variables in formulas, different editors have different preferences as for how they like to enter and edit them. Neither style should be enforced or preferred. –Deacon Vorbis (carbon • videos) 22:05, 26 May 2020 (UTC)
OK, I've created an alternate JWB configuration to use on STEM articles that won't change Greek letters. -- Beland (talk) 22:20, 26 May 2020 (UTC)
It's not just Greek letters. Other symbols like arrows, inequalities, operators, all sorts of other things I'm not thinking of at the moment too, would have the same issue. And does it even matter if it's a "STEM" article if someone's allowed to use &times; instead of ×? — Preceding unsigned comment added by Deacon Vorbis (talkcontribs) 09:27, 27 May 2020 (UTC)
I never change "&times;" because it is too easily confused with the letter "x". Entities for Greek letters and the symbols you're talking about are generally not used outside of STEM articles and Greek topics, I assume because variables and formulas aren't used in other fields. Unlike named HTML entities for say, Latin characters, which have been scattered around everywhere but which it appears no one wants. -- Beland (talk) 18:49, 27 May 2020 (UTC)
MOS:MARKUP does currently say that HTML markup should be avoided, unless an HTML entity would avoid potential confusion – No, it doesn't say that. It says wikitext formatting is considered easier to use than HTML, but this isn't formatting. It also says An HTML character entity is sometimes better than the equivalent Unicode character, which may be difficult to identify in edit mode, but leaves unspecified when that sometimes is; my feeling is pretty much Deacon Vorbis's but, again, you should not be mass changing the choices of article editors. EEng 03:32, 27 May 2020 (UTC)
Yup, that's why I'm gathering information bottom-up. -- Beland (talk) 06:37, 27 May 2020 (UTC)

It is also incorrect to make these changes:

  • 逸 → 逸
  • 謁 → 謁
  • 禍 → 禍
  • ...

If character on the left is pasted literally (not as HTML entity), then MediaWiki software corrupts it into character on the right. So using HTML entities is the only way for these 舊字體. There are 62 affected 舊字體 characters. There may be other affected characters (e.g. different variants for China/Japan/Korea). I reverted your changes in List of jōyō kanji. Arfrever (talk) 06:08, 11 June 2020 (UTC)

@Arfrever: Thanks for catching that. I'll update my scripts and see if I can't find a complete list of characters that are automatically transformed like that. -- Beland (talk) 13:55, 11 June 2020 (UTC)

I have written this script for finding characters automatically transformed:

#!/usr/bin/env python

import unicodedata

print("=== Unicode %s ===" % unicodedata.unidata_version)

for i in range(0x110000):
  original = chr(i)
  transformed = unicodedata.normalize("NFC", original)
  if original != transformed:
    print("\"%s\" (U+%08X, %s) → " % (original, ord(original),, end="")
    # Some single characters are transformed into multi-character sequences.
    if len(transformed) == 1:
      print("\"%s\" (U+%08X, %s)" % (transformed, ord(transformed),
      print("[", end="")
      print(", ".join("\"%s\" (U+%08X, %s)" % (x, ord(x), for x in transformed), end="")

Different versions of Unicode database are embedded in different versions of Python, so I suggest to use newest available version of Python. Arfrever (talk) 22:22, 11 June 2020 (UTC)

@Arfrever: Awesome! My backend code uses Python too, so I just put in a call to unicodedata.normalize() to check page content dynamically and avoid triggering on instances that look unsafe. Many thanks for your assistance. -- Beland (talk) 01:45, 12 June 2020 (UTC)


Strickesel and I have finished Wikipedia:Typo_Team/moss/before_A. (There are still some things I can't figure out, like <li> in html coding.)

Whenever you get a chance, please activate 'A' and we'll get to work on that, and hopefully others will join. And keep up your good work.


Ira Leviton (talk) 15:29, 27 May 2020 (UTC)

@Ira Leviton and Strickesel: Ooo, that was fast! Unfortunately, I haven't been able to keep up with clearing out case notes from the larger letters like "A". I'm going to make some code changes that will suppress new listing for articles that have case notes, so I can refresh the big letters without worrying about duplicate effort. Hopefully that'll be ready in a few hours. In the meantime, I've updated Wikipedia:Typo Team/moss/Q from the database dump I just processed the other day, which should hopefully be short and fun. -- Beland (talk) 19:02, 27 May 2020 (UTC)
No problem – I was wondering if you were going to go to 'A' or another letter that you had cleared out. And there are always other typos and other things on Wikipedia to fix if you're not caught up. Your moss tool just makes things much easier to find. Ira Leviton (talk) 19:22, 27 May 2020 (UTC)

Articles for Creation: List of reviewers by subject notice[edit]


Hi Beland, you are receiving this notice because you are listed as an active Articles for Creation reviewer.

Recently a list of reviewers by area of expertise was created. This notice is being sent out to alert you to the existence of that list, and to encourage you to add your name to it. If you or other reviewers come across articles in the queue where an acceptance/decline hinges on specialist knowledge, this list should serve to facilitate contact with a fellow reviewer.

To end on a positive note, the backlog has dropped below 1,500, so thanks for all of the hard work some of you have been putting into the AfC process!

Sent to all Articles for Creation reviewers as a one-time notice. To opt-out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page. Regards, Sam-2727 (talk)

MediaWiki message delivery (talk) 16:35, 27 May 2020 (UTC)

Neighborhoods in Los Angeles[edit]

I started a sortable list on the talk page of List of districts and neighborhoods of Los Angeles. Could you please take a look and let me know if you think it has value. Thanks. Phatblackmama (talk) 18:55, 3 June 2020 (UTC)

Nomination of Deir for deletion[edit]

A discussion is taking place as to whether the article Deir is suitable for inclusion in Wikipedia according to Wikipedia's policies and guidelines or whether it should be deleted.

The article will be discussed at Wikipedia:Articles for deletion/Deir until a consensus is reached, and anyone, including you, is welcome to contribute to the discussion. The nomination will explain the policies and guidelines which are of concern. The discussion focuses on high-quality evidence and our policies and guidelines.

Users may edit the article during the discussion, including to improve the article to address concerns raised in the discussion. However, do not remove the article-for-deletion notice from the top of the article. Clarityfiend (talk) 08:49, 4 June 2020 (UTC)

Explaining Los Angeles neighborhoods[edit]

Hello. I see you made some changes to some Los Angeles neighborhoods and hope you don't mind if I explain a few things to you.

City designated neighborhoods[edit]

(1) Per the City of Los Angeles, neighborhoods are named by a specific process and then given official signage. These signs are noted on the wikipedia page Los Angeles Neighborhood Signs. LAist stated that these signs indicate “official L.A. neighborhood” designation [1][2]

(2) The city of Los Angeles does not have different signs for neighborhoods that nest within larger neighborhoods. The city has posted Mid-City signs from just west of downtown to almost Culver City. Within Mid-City are other neighborhoods. Here is a photo of Mid-City signage [3], along with Mid-City Heights sign right behind it, placing it inside the borders of Mid-City.

(3) The same goes for Baldwin Hills and Baldwin Vista. As noted, Baldwin Vista is a "western Baldwin Hills neighborhood" [4]. But the city gives each of them their own neighborhood sign.

LA Times Mapping Project[edit]

(4) This is where is gets messy!

A decade ago, The Los Angeles Times felt there were too many designated neighborhoods in Los Angeles. Indeed, you can drive down Olympic Boulevard and go past a handful of neighborhoods in a quarter mile. So, the Mapping L.A. project of the LA Times decided to redraw neighborhood lines. The LA Times Mapping Project reduced 472 neighborhoods down to 115.

The neighborhoods of Crenshaw and Baldwin Hills were combined into a new entity called Baldwin Hills/Crenshaw. (user Been Aroundawhile, a former reporter for the LA Times, was instrumental in adding these new entities to wikipedia and deleting the city designated neighborhoods - which were promptly added back in). And if you go through the citations on the Baldwin Hills/Crenshaw page, you will see that the only usage of the name "Baldwin Hills/Crenshaw" is used by the LA Times Mapping project; all other sources refer to the neighborhoods of either "Baldwin Hills" or "Crenshaw".

(5) Regarding Mapping L.A......please look at the geography section of Arlington Heights, Los Angeles. The city has documented its boundaries and placed neighborhood signage on the corners. But the Mapping L.A. project expands Arlington Heights past those boundaries, and combines it with Country Club Park and Angelus Vista. The Mapping L.A. project does that a lot - combining multiple neighborhoods under one name for the sake of simplicity -- that is, reducing 472 neighborhoods down to 115.

(6) Comparing this map [5] with the Mapping LA Project, Elizabeth Fuller wrote in the LarchmontBuzz [6] that "Many people who live in and represent their neighborhoods in various ways have objected to the Times’ designations for not following city-recognized borders.” She said that Brightwell's map was a much more fine-grained view of “every L.A. neighborhood.”

(7) It appears that in 2018, even the LA Times is not sticking to the "Baldwin Hills/Crenshaw" name that it created a decade earlier and now simply uses the name "Baldwin Hills" [4].

(8) please do not think this is just an issue with "Baldwin Hills/Crenshaw". The Mapping Project has designated many neighborhoods that contradict city boundaries.


(9) Jenna Chandler, the editor of Curbed Los Angeles, wrote that Brightwell's map of 472 neighborhoods "looks more accurate than the neighborhood maps compiled by the Los Angeles Times."[7]

(10) I hope I have laid out everything clearly. I therefore strongly object to your wording on the Baldwin Hills page that "it is part of Baldwin Hills/Crenshaw" without noting that Baldwin Hills is a city-named place and that Baldwin Hills/Crenshaw is a creation of the Los Angeles Times that neither the city nor other sources recognize. [8] To be accurate, it would have to be stated that "The LA Times mapping project combines Baldwin Hills and Crenshaw into the neighborhood of Baldwin Hills/Crenshaw".

I hope that I have stated everything clearly. Yours, Phatblackmama (talk) 18:53, 9 June 2020 (UTC)

  1. ^ "Kemp Powers,LAist Neighborhood Project: Franklin Hills, November 16, 2007". Archived from the original on October 27, 2019. Retrieved March 10, 2020.
  2. ^ "Zach Behrens, LAist Wake Up LA, February 12, 2008". Archived from the original on November 13, 2017. Retrieved March 10, 2020.
  3. ^ signage at the intersection at La Brea Avenue and the Santa Monica Freeway
  4. ^ a b
  5. ^
  6. ^ Elizabeth Fuller, "LarchmontBuzz" July 29, 2017
  7. ^ Jenna Chandler, "Which LA. Neighborhood Do You Really Live In?" December 27, 2019
  8. ^
@Phatblackmama: Ah, that's an interesting practice with the signs. The Brightwell map is great, and I've been using it as a reference for my edits on the discussion at Talk:List of districts and neighborhoods of Los Angeles; it also reports that some small neighborhoods are considered part of larger neighborhoods, though that nesting doesn't always agree with the LA Times. Neighborhoods are generally a fuzzy concept, and different people have different ideas about where they start and end. The Mapping LA project has apparently redrawn its maps based on reader feedback, so it represents at least an approximation of what locals generally agree on (to the extent that they agree). That may or may not align with the official city definition, especially in terms of flat vs. nested definitions, but that doesn't make one or the other incorrect. I'm not sure Baldwin Hills/Crenshaw is entirely a creation of the LA Times, though it's difficult to tell from afar. Apparently there's a mall called Baldwin Hills Crenshaw Plaza? But attributing the assignment to Baldwin Hills/Crenshaw to Mapping L.A. is good practice, so I've modified the Baldwin Hills, Los Angeles article. The article Baldwin Hills/Crenshaw, Los Angeles just says that this is a neighborhood, not that it is a creature of the LA Times. If you think it's not a real thing, perhaps this article should be deleted and its contents split between Baldwin Hills, Los Angeles and Crenshaw, Los Angeles? That would lose all the data supplied by Mapping L.A., so alternatively this entity could be described as a statistical grouping, if that's really all that it is. Given that the LA Times has used the term as if it's a neighborhood name, I'd say it's probably more than just a statistical grouping, even if sometimes it talks about only Baldwin Hills. (Just like it's sensible to talk about the Fenway neighborhood in Boston, where I used to live, even though the officially designated city district is Fenway-Kenmore and that's also a neighborhood real estate agents talk about. Actually, that reminds me that real estate agents are a good source of information about how neighborhood names are defined and used by people on the ground. I see online some LA rental agents talk about "Baldwin Hills" alone, but this one uses "Baldwin Hills-Crenshaw". -- Beland (talk) 20:32, 9 June 2020 (UTC)
Re: Baldwin Hills/Crenshaw...the Baldwin Hills Crenshaw Plaza was named in 1989, long before the LA mapping project, and derives its name from its location, straddling two adjacent neighborhoods (sort of like how the Wiltern Theater, located at Wilshire and Western, derived its name). The LA Times (I can only assume) saw that name and, 20 years later, decided to combine the two neighborhoods. That makes sense, if your goal is reducing the number of neighborhoods in LA, as noted above.
You note that in Boston, wikipedia uses official names, such as Fenway-Kenmore. In this case, the official names that the city and state use are: Baldwin Hills and Crenshaw. [1] [2] Other sources stick to those official names...You spent time googling "Baldwin Hills/Crenshaw". I am sure you saw google's info box when searching for "Baldwin Hills/Crenshaw" get the mall, not a neighborhood ( Versus googling "Baldwin Hills", which displays an info box with a city map and neighborhood information ( Or "Crenshaw", which does the same (
You also mention that you found one real estate agent who uses that name. That is not a notable source.
To be clear, I have no problem listing "Crenshaw", "Baldwin Hills", and "Baldwin Hills/Crenshaw" in the grid. But each listed individually...respecting the fact that the city and state considers them separate and distinct neighborhoods and, concurrently, that the LA Times considers them to be one. Wikipedia must remain neutral. Phatblackmama (talk) 00:35, 10 June 2020 (UTC)
@Phatblackmama: Well, the Wikipedia policy on naming is to use the common name, not necessarily the official name, though that's for two names for the same thing, not two names for related things of different sizes. I use Duck Duck Go, not Google; I get an infobox for Baldwin Hills/Crenshaw, Los Angeles when I search on "Baldwin Hills/Crenshaw" and not when I search for "Baldwin Hills Crenshaw". Google happens to ignore the punctuation that makes the difference and Duck Duck Go doesn't. Both are using Wikipedia to power infoboxes for neighborhoods, so it's a bit circular to rely on them for what Wikipedia should title its articles. Notability is not a criterion for sources; that's for determining what articles to have.
Are you sure the LA Times' naming isn't reflecting a real overlap in identification or naming or culture? After all, Baldwin Village has apparently been part of Crenshaw since it was The Jungle(s), but now it has "Baldwin" in the name. -- Beland (talk) 02:17, 10 June 2020 (UTC)
When searching "Baldwin Hills/Crenshaw" you do get an info box..for the mall, not a neighborhood. [1] But you use duck duck go, which processes around 1.5 billion searches every month. Google, in contrast, processes around 3.5 billion searches every day. Needless to say, more people see the mall when searching "Baldwin Hills/Crenshaw" and a neighborhood when searching "Crenshaw" or "Baldwin Hills". A lot more people. And Bing comes up with the mall also.
You correctly note that some of the information in the info boxes on google are from wikipedia. But not the maps. They use city maps, not the LA Times mapping project.
And Notability is not a criterion for sources, but one real estate listing is hardly a reliable source to define a neighborhood. That's all ya got?
It seems that you are trying to come up with some reason as to how or why the LA Times came up with the entity "Baldwin Hills/Crenshaw". That is not our job. We are supposed to cite sources....and both city and state, and the Los Angeles Times (prior to the mapping project), use the separate names of "Baldwin Hills" and "Crenshaw". And, a decade after the mapping project finished, the city, state, Laist, curbed and Los Angeles magazine have not used the name "Baldwin Hills/Crenshaw" either as a stand-alone or as a parent neighborhood. Phatblackmama (talk) 04:02, 10 June 2020 (UTC)
@Phatblackmama: Well, it sounded like you were theorizing that the LA Times had the goal of reducing the number of neighborhoods on its map, and used the name of a mall as a pretense to combine two neighborhoods, with the implication that they somehow didn't deserve to be combined, and also emphasizing that no source other than the LA Times used the name. I'm not proposing that real estate agent should be used as a source an an article; I'm just pointing out that while other cartographers might not use the term, it is in use in commerce. It's a completely reliable source, but only to establish that this particular real estate agency uses the term to describe the same area described by the LA Times. I'd say it's not suitable as a reference for an article not because it's unreliable, but because it's a primary source. Secondary sources like cartographers consult with primary sources like businesses and readers and their work product is much more useful for writing articles. As I wrote on Talk:List of districts and neighborhoods of Los Angeles, I think a more likely theory is there are some blocks certain people call "Crenshaw" (like Google Maps does) and other people call "Baldwin Hills" (like Brightwell) and the LA Times decided not to pick one over the other, or couldn't draw a clear boundary between those identities. If you don't agree, that's fine; we have already agreed to recognized the LA Times definitions as one of several sources to be cited in LA neighborhood articles.
Just FYI, search engine results are automated (I'm a programmer; I've built search engines and built robots that used search engines to answer questions), and the top results and infobox results have not necessarily been verified by a person to be correct, and higher-traffic doesn't necessarily mean more accurate for any given query. If I ask Google "what is the population of Mars", it tells me "ten billion humans". "Who is the king of Mars?" Abraham Lincoln. Google results are also personalized based on search history, so not everyone sees the same results. The infobox result from Duck Duck Go for "what is the population of Mars" happens to be Colonization of Mars, but ::shrug:: it got lucky. -- Beland (talk) 04:36, 10 June 2020 (UTC)

Enough is enough[edit]

Please roll back your latest JWB run. I'm this close to taking the matter to ANI. A spot check finds a few of these as obviously okay, but most not. Some of your changes are inappropriate, like changing &Rho; and &Zeta; etc. These are indistinguishable from their Latin alphabet counterparts and should not be changed unless part of an appropriate lang template or part of obvious Greek text. You've also continued to change special characters in math formulas EVEN AFTER WE DISCUSSED THIS AND YOU AGREED NOT TO. You've changed characters like &darr; that have no business being changed. –Deacon Vorbis (carbon • videos) 16:22, 17 June 2020 (UTC)

@Deacon Vorbis: Sorry, I saw your revert on Pi (disambiguation) which was definitely a mistake, as those entries are all STEM and not Greek; I was about to thank you for that when I saw this message. Today I'm working through a list where I manually excluded the STEM articles so I could find the articles with Greek words that everyone seems to agree should be converted. If there are changes to Greek letters in math formulas, that's probably a mistake, though obviously some formulas already use characters directly. Where are the changes to Rho and Zeta you saw? The weird cases I remember changing are for Greek national license plates, where there are two-letter Greek prefixes. I've been changing them over since they are in articles about Greece, so it'll be confusing in the future if they keep showing up, but since the characters are hard to distinguish I've been actually been putting reader-visible notes. That should actually be an improvement over the previous versions, since before they just looked like Latin characters to readers, even if editors might have noticed they were not. If I remember correctly, you were previously arguing that editors might prefer various notations and that they should be left that way. In STEM articles with math markup and whatnot, it does make sense that editors will need some familiarity with markup languages, though using multiple markup languages in the same document I think is still problematic, as it creates an unnecessarily high barrier to entry. But as you requested I've been skipping those articles, as well as some borderline biographies that have a lot of STEM markup. I don't think that argument holds for general biographies and sports articles where there's typically only one special character in use. Editors of those articles are unlikely to be familiar with HTML, and there's been a years-long effort and a whole Wikiproject devoted to nothing but replacing HTML with wikitext so that editors don't have to learn HTML. Regarding darr in particular, it's easily accessible from the "Insert" widget on the default edit screen, on the "Symbols" list. Looking at the 2020-05-20 database dump, in practice there are 15479 instances of the actual ↓ character, and only 205 instances of the darr HTML entity. That seems like a pretty strong de facto preference by the silent majority. I'm not going around telling anyone that they can't use the darr HTML entity; I welcome contributions in any format. But switching it over for them to the WYSIWYG form means that other editors have an easier time understanding what's going on, and tweaking it as necessary, especially if the editor who originally added the entity doesn't stick around indefinitely. If that doesn't make sense to you, I'm open to soliciting more opinions. -- Beland (talk) 17:06, 17 June 2020 (UTC)

Thanks / Massachusetts / SPLC[edit]

Thank you for noticing that the count of 2 was wrong in List of Confederate monuments and memorials. I updated the count of ALL of the states from the live database, and those two were still marked as "Live" (not "Removed", even though the "Year removed" column was marked "2019"). I have notified SPLC of the inconsistency through their online form. Also... good citation! Normal Op (talk) 02:17, 30 June 2020 (UTC)

@Normal Op: Oh, excellent. Thanks for your attention to detail! -- Beland (talk) 02:22, 30 June 2020 (UTC)

greetings in passing[edit]

By one of those odd coincidences, in the space of 10 minutes this afternoon I came across (a) an email thread in my archives which began with a message, I'm pretty sure from you, soliciting help with a new-for-1999 version of the "Pink Tour" for MIT dorms and FSILG's, and then (2) a note from you on the talk page for Cambridge. Small world! Take care (and apologies for cluttering your talkpage with irrelevant chitchat...) —Steve Summit (talk) 19:51, 1 July 2020 (UTC)


This is up for deletion (current consensus seems to be tending to KEEP). In any case, whether the template is deleted or kept - it needs to display in a legible manner in the articles where it is currently used (all else confusion). --Soundofmusicals (talk) 06:44, 9 July 2020 (UTC)

I have reverted the template itself to its original form - but is there another way to refer editors to the discussion page? Or am I displaying my ignorance? --Soundofmusicals (talk) 07:01, 9 July 2020 (UTC)
@Soundofmusicals: I don't know of any off the top of my head. User:Johnuniq has suppressed display of the template on articles, which fixes the breakage, at least. -- Beland (talk) 17:23, 9 July 2020 (UTC)

Dividing COVID-19 pandemic in Boston#Government response into subsections[edit]

Hi! I have seen you have divided the Government response section of COVID-19 pandemic in Massachusetts into subsections. If you could do the same to COVID-19 pandemic in Boston per Talk:COVID-19 pandemic in Boston#Government response sections, that would be great. Thank you! Qwerty325 (talk) 19:47, 14 July 2020 (UTC)

@Qwerty325: If you think that section needs subsections, by all means feel free to be bold and divide it up. It could be divided either by time period (like the closing phase vs. the reopening phase) or by subtopic if you think it would be clearer not being chronological. -- Beland (talk) 01:12, 15 July 2020 (UTC)

"Western Expansion of the United States" listed at Redirects for discussion[edit]

Information.svg A discussion is taking place to address the redirect Western Expansion of the United States. The discussion will occur at Wikipedia:Redirects for discussion/Log/2020 July 27#Western Expansion of the United States until a consensus is reached, and anyone, including you, is welcome to contribute to the discussion. Steel1943 (talk) 16:35, 27 July 2020 (UTC)

Letter B - Moss project[edit]


I just wanted to give you a heads up that the letter B page on the Moss projects is going to be finished either today or tomorrow, so if you want to activate another page, go right ahead.


Ira Leviton (talk) 17:38, 28 July 2020 (UTC)

@Ira Leviton: Awesome, thanks for the note! The 2020-07-20 dump processing is done, so I just posted a fresh batch of "C" typos. -- Beland (talk) 20:58, 28 July 2020 (UTC)

United States territorial acquisitions#Mexican boundary[edit]

Hi. I wish to have a copy of the wiki markup for the article section, United States territorial acquisitions#Mexican boundary. It is my hope that you have retained a copy of the article prior to its merger. Would you please provide that section of markup to me? Jeff in CA (talk) 20:39, 3 August 2020 (UTC)

@Jeff in CA: You can see older versions of any article by clicking on the "View history" tab on the article page. In this case, that brings you to: [5]. The "Mexican boundary" section did not exist when I merged that articledid. Using the "find removal" tool, I found the last revision where it did, which is [6]. -- Beland (talk) 01:36, 4 August 2020 (UTC)
@Beland: Thank you. Good grief, did that IP editor back in May 2019 ever take a hatchet to the former article, deleting 17,624 net characters! He only ever made three edits under his IP, but that was the most harmful. He completely altered the entire article, without even a discussion on the Talk page, as far as I can tell. I would have thought that kind of wholesale change would have been challenged, but it seems to have sailed right through. I don't know how I missed it.Jeff in CA (talk) 05:38, 4 August 2020 (UTC)

Your edit to Input/Output Control System[edit]

Your August 50 edit to Input/Output Control System had an explanation of "convert special characters (via WP:JWB)", but instead of changing "&#091;Article&#093;" to &lbrack;Article&rbrack;, you deleted the text. Was that deliberate, or is WP:JWBbroken? Shmuel (Seymour J.) Metz Username:Chatul (talk) 11:55, 5 August 2020 (UTC)

@Chatul: In that case, I made a manual change to the edit to drop what looked liked unnecessary text that contained the special characters. -- Beland (talk) 12:04, 5 August 2020 (UTC)