RadioBanter - View Single Post - A Modest Proposal to Save Five-Level Code

John Savard · March 14th 17, 03:16 PM posted to rec.radio.amateur.digital.misc

In searching for information on the Web about versions of ITA 2 as modified for
different languages, I came across a web page which noted that, because of the
stateful nature of 5-level code... because a garble can lead to a mistaken shift
into FIGS case... 5-level code is quite rightly something that ought to be
relegated to the past.

In the original Murray code, on which ITA 2 was based, instead of a letters shift
and a figures shift, there were "letters space" and "figures space" codes, just as
there were in Emile Baudot's original five-level code (on which ITA 1 is based,
and which differs from what is commonly called "Baudot", ITA 2 or its slightly
incompatible Western Union variant).

This would, of course, be limiting. Shift codes are needed, since one doesn't want
to be forced to insert a space when changing between letters and figures. But even
with shift codes, many RTTY operators adapted their teletypes so that the space
was a letters space by installing the "unshift on space" option.

In addition to printable characters, the figures case on a 5-level
teletypewriter includes two control characters - WRU and BEL.

Here's my suggestion for bringing 5-level code into the 21st Century, in stages.

a) Make unshift on space standard; change Space to Letters Space.

b) Change WRU to ESC - not an exact equivalent of ASCII Escape, but allowing two
character control codes for things like WRU, BEL, BS (backspace) and so on.

And change BEL to Figures Space, so that one isn't forced to switch to letters
and then change back when introducing spaces into figures. (Note, though, as
that is the figures shift of a letter, not a code on its own, unlike Letters
Space, it doesn't serve as a reminder that one is in figures shift.)

c) A first expansion of the character repertoire to include lower-case and more
figures characters can now be introduced.

However, what I propose will be different from, and simpler than, either ITU
recommendation S.2 (where a "superfluous" LTRS issued in letters case toggles
between upper and lower case) or ASCII over AMTOR (where the all zeroes
character, used elsewhere as a third shift for languages like Russian or Greek,
switches to the alternate characters).

Since I propose not only to shift the letters case into upper and lower case,
but also to shift the figures case for additional characters, now there would be
more positions that could be spared for control codes. So instead of using ESC,
I would make the UC and LC shifts two additional control codes within the
figures case.

d) And then a second expansion of the character repertoire, to allow the equivalent of a "third shift" for supporting a non-Latin alphabet, would also be provided for right from the start in the basic design.

However, the all-zeroes character would _not_ be used to shift into it. Instead, an additional two control codes would be taken from the figures case, SI and SO, similar to ASCII.

So I envisage the figures case as looking like this:

QWERTYUIOP - 1234567890 in lower case, and !@#$%&*() in upper case.

ASDFGHJKL - -' ESC n n n Fig Sp SI SO in lower case, and _" ESC n n n Fig Sp SI SO in upper case.

ZXCVBNM - ? / ; = ? , . in lower case, and ? ? : + ? in upper case.

The two unused ? positions and the three national use n positions would be somehow assigned to the four positions needed for additional ASCII characters. One possibility would be:

ASDFGHJKL - -' ESC ~[] Fig Sp SI SO in lower case, and _" ESC `{} Fig Sp SI SO in upper case.

ZXCVBNM - \ / ; = ? , . in lower case, and | ? : + ? in upper case.

taking as much inspiration as possible from ASCII over AMTOR.

e) But what about unshift on space? How can that be reconciled with having a
third shift language?

One extra character is available: 00000. So while the regular space character
becomes a letters space, this character could become the "third space".

However, *that* has a grievous flaw.

11111 is a _shift_ code, the letters shift, so that despite it doing something,
because what it does can be fully undone by a subsequent shift code, it can
still also serve the same function as DEL in ASCII - correcting errors by
punching over the character involved.

00000 is the code that is present on the blank leader of tape. So it shouldn't
"do something" irrevocable like advancing the carriage one space. It should be
allowed to perform the function of NUL in ASCII, which it could back when it was
used as the third-language shift.

But the idea of having both a "letters space" and a "third space" so that there
is a constant reminder of the state is useful and important - the vulnerability
of stateful five-level code is the very issue I'm trying to address.

Well, there _is_ another shift code already present.

So I propose to change the assignment of the FIGS shift code from its present
value to the all-zeroes code.

But instead of "third space" getting the existing FIGS shift code, since the
space is the most common character, it should get a code with only one bit set,
like the code for the regular space, here used for "letters space".

So I propose that "third space" should get the existing code for *carriage
return*, with carriage return getting the existing FIGS shift code.

It's unfortunate that the existing assignments of two characters are changed in
an incompatible manner, but this allows the revised code to be faithful to the
original rationale behind the design of the ITA 2 code.

To be specific:

Since the upper case and lower case codes are _within_ the figures shift, the
letters shift, as well as SI and SO, must not affect the upper/lower case shift.

Figures shift, on the other hand, could always proceed to lower case within the
figures case, so as to go directly to the digits and the most common punctuation
marks.

SI goes to the "third shift" language, and SO returns from it to the Latin
alphabet. The characters for the figures case may also be different in the third
shift language, not just the ones in the letters case.

John Savard