Wikipedia talk:Canonicalization

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Canonicalization? Is this a word? RickK 08:42, 10 Jan 2004 (UTC)

Apparently so - I had the same question. It seems to mean the act of making canonical and is important for XML and more generally for data archiving. MS even provide a patch for File Permission Canonicalization Vulnerability - Microsoft Security Bulletin (MS00-057). I think the real word should be is Canonisation (z if you prefer) as with Saints.--Henrygb 22:03, 19 Mar 2004 (UTC)
Wiki Canonization and Wikipedia:Canonicalization point to the same page.--Henrygb 22:40, 24 Mar 2004 (UTC)
The real word for creation of a saint is still canonisation (or z, if you prefer), as Henrygb noted; but canonicalization is a new word, coined by the programming community. Scientists often generate new words out of technical necessity; by intentionally avoiding the existing, similar words (such as "canonisation"), they free the word from potentially undesirable legacy definitions. An example is "stipe", used by mycologists to describe the stem of a mushroom.
  • Q: Why do they not simply use "stem"?
  • A: Because that term already has legacy definitions (sections of a plant that support other above-ground bodies, and contain xylem and phloem), which are not appropriate to this specific usage (mycofungi do not contain xylem nor phloem structures).
Thus, canonicalization is a valid, new word.--JoeMarfice 22:03, 19 Mar 2004 (UTC)


Illegal characters?[edit]

I can't see evidence that illegal characters are stripped. Try this li^nk. If they are indeed stripped, we should document which characters are stripped and under what circumstances. --Diomidis Spinellis 13:20, 31 March 2006 (UTC)[reply]


Handling of spaces[edit]

By looking at the Wikipedia dumps it seems to me that underscores are converted into spaces, and not the opposite. --Diomidis Spinellis 13:20, 31 March 2006 (UTC)[reply]

I believe spaces and underscores are indifferentiated (treated as the same character). For example, the internal name of an article will never have a space in it (spaces are converted to underscores) and the shown name of an article will never have an underscore in it (underscores are converted to spaces).
– Andreas Blixt  13:41, 13 June 2006 (UTC)[reply]

Note that the underscore (_) is not equivalent to a space: putting it in the URL results in a search for a word containing actual underscores at those positions. Thus

http://www.google.com/search?q=site%3Awikipedia.org+%22interwiki_link%22

finds pages containing the term interwiki_link. From meta:Help:Searching#Google.

However, Google searches the shown, not the raw text (am I mistaken?), so this does not tell us anything about canonicalisation. But perhaps there are other internal or external situations where the underscore and the space of the raw text are treated differently. --Eleassar my talk 19:37, 17 June 2006 (UTC)[reply]

In fact, if it is true that the internal and the external name of the article are written differently ... Suppose there are two articles with the same name, the difference being only in the underscore (Foo_bar and Foo bar). How does one differentiate between them without looking at the raw text if they are both written the same? --Eleassar my talk 19:57, 17 June 2006 (UTC)[reply]

I think you misunderstood what I meant. I was talking about actual Wikipedia articles in the database. Spore (video game) is internally stored (in the database) as Spore_(video_game). Now I can link to Spore_(video_game), and it'll still go to the same article. Why? Because when requesting a page from the database, Wikipedia's code will convert "Spore (video game)" into "Spore_(video_game)", then request the article. Because of this, spaces and underscores are treated the same. The case you are demonstrating is for URLs outside the Wikipedia URL space. Normally URLs do differentiate between spaces and underscores (with %20 being an encoded space, since a HTTP request may never contain the actual space character for the path). As for your second case, "Foo_bar" and "Foo bar" are exactly the same article. They cannot be different articles.
Addendum: As further demonstration, try the following URLs: http://en.wikipedia.org/wiki/Spore (video game) and http://en.wikipedia.org/wiki/Spore_(video_game)
– Andreas Blixt  20:12, 17 June 2006 (UTC)[reply]


I have a question that perhaps others have also. Whenever I try to use an internal link with a space in it, the link always turns into one with the "&action=edit" ending. I have to type the space as an underscore to make it look and behave right. In other words I'm disputing that [[word]] is the correct way to get word when two words or more are present in the title. I had trouble with Travellers Check when I noticed it. It seems that it's required to use underscores if there are more than one word in the title. Is that true? 71.169.146.148 21:06, 29 March 2007 (UTC)[reply]

Merge?[edit]

Is there really anything of substance on this page that isn't covered by (or couldn't easily be added to) Help:Link? How about merging this page into that one?--Kotniski (talk) 12:02, 9 July 2009 (UTC)[reply]

No objections, so I've been bold and done it (revert if you're not happy).--Kotniski (talk) 10:01, 21 July 2009 (UTC)[reply]