User:Chocolateboy/Dashes

From Wikipedia, the free encyclopedia

From Wikipedia talk:Manual of Style (dates and numbers):

(The file normalized.txt referenced below is a version of the 20040727 cur table dump with talk pages removed (script available on request). Due to stack overflow issues in Perl's recursive regular expression engine, a few longer articles are also excluded from these statistics.)

Globally, spaced hyphens are at least 15 times more common than ndashes and mdashes combined:

grep '–' normalized.txt | perl -pe '$_ = join ($/, /–/g) . $/' | wc -l
> 14663
grep '—' normalized.txt | perl -pe '$_ = join ($/, /—/g) . $/' | wc -l
> 16526
grep ' - ' normalized.txt | perl -pe '$_ = join ($/, / - /g) . $/' | wc -l
> 494155

Likewise, hyphens are approximately 40 times more popular than dashes for date ranges:

grep '\]\] – \[\[' normalized.txt | perl -pe '$_ = join ($/, /\]\] – \[\[/g) . $/' | wc -l
> 2698
grep '\]\]–\[\[' normalized.txt | perl -pe '$_ = join ($/, /\]\]–\[\[/g) . $/' | wc -l
> 2599
grep '\]\]-\[\[' normalized.txt | perl -pe '$_ = join ($/, /\]\]-\[\[/g) . $/' | wc -l
> 59366
grep '\]\] - \[\[' normalized.txt | perl -pe '$_ = join ($/, /\]\] - \[\[/g) . $/' | wc -l
> 160911

As you can also see from those stats (which exclude some date ranges and include some non-date-ranges: patches welcome!), spaced hyphens are used approximately 3 times more often than unspaced hyphens.

chocolateboy 23:39, 16 Sep 2004 (UTC)