by Richard F. Drushel (drushel@apk.net)
I used to archive past TWWMCA articles in an anonymous FTP directory at my ISP, APKnet. However, due to security concerns, APKnet shut off all anonymous FTP access in mid-August, 2000. (Some FTP clients can be cracked by the unscrupulous.) So, the TWWMCA archive needed to move.
I could have put the archive on one of the machines under my control here at CWRU, some of which run FTP servers. However, the lab machines tend to get moved around, or replaced, or given different IP addresses and hostnames. So, I didn't view these as very stable choices. APKnet sysadmins suggested that I make the archive available by HTTP (which is an "anonymous" protocol as well). Indeed, even binary files are often made available nowadays by HTTP. I could just move the files to my webpage space and make a simple index of links to each article. However, in for a penny, in for a pound: if they were going to be available via HTTP to web browsers, I thought that they should at least be reformatted to be webpages, written in HTML, instead of just plaintext dumps.
In addition, just after I posted TWWMCA 0008.13 to USENET, a reader of comp.os.cpm commented that perhaps the articles would reach more people if they were archived as HTML, or at least be more accessible. Why not, he suggested, post a pointer to new TWWMCA articles at the HTML archive, instead of posting the plaintext articles themselves?
The current subscribers to the Coleco ADAM mailing list will recall that I polled everyone to find out whether or not they wanted plaintext versions of TWWMCA to continue. The clear opinion was "whatever you do with the archive, keep posting them to the mailing list as plaintext". Since USENET readership is a secondary audience, I had to keep writing TWWMCA as plaintext.
I thus decided that the archive should be HTML, but the original articles should continue to be plaintext. This meant that I needed to convert the plaintext articles into HTML.
To do the conversion by hand would be a very tedious job: there were 31 existing articles. Letting some word processor do an HTML export was not an option for me, either, as the HTML spewed out by word processors is ugly, verbose, and just *bad*. However, somehow automating the conversion was very desirable. What to do?
Fortunately for me, the plaintext TWWMCA articles have adhered very closely to a consistent layout format. All paragraphs are single-spaced, with a blank line in between them. Headings begin with Roman numerals, and there is always at least one blank line before and after each heading. Numbered lists begin with numerals in parentheses, like (1) and (2), and usually each item was on a separate line. Even the title header and author line had the same format in each article.
Thus, I decided that I could write a filter program (in QuickBASIC 4.5 for MS-DOS) which could read the original plaintext articles, add the necessary HTML header and footer, classify the text as title header, section header, paragraph, or numbered list, and add the appropriate HTML tags. Of course, there might be other manual editing to do (such as to add real links to URLs, or to show pictures instead of just having a reference to a picture file), but this would be a major drudgery saver.
I looked over a few of the original plaintext articles to see if there was anything important I would be missing with this simple, one-pass filter program. Three items became apparent:
Most of the numbered list items did not have blank lines between them, making it harder to determine where items ended.
Several articles had extensive sections of SmartBASIC or Z80 assembly code, whose formatting would be lost if blindly rendered as HTML (which ignores extra blank spaces in text, squeezing everything together).
The characters <, >, and & appeared in the text. These are reserved characters in HTML, to be used in HTML tags only. There are special HTML escape sequences which must be used to reproduce these characters as literals.
After a little thought, I came up with workarounds for these three cases:
Before running the filter program, manually add blank lines between numbered list items.
Manually add, on separate lines before and after blocks of code, the HTML tags <PRE> and </PRE>, and write the filter program so that it does not alter any text it encounters between a <PRE> and a </PRE> tag. <PRE> is used to specify pre-formatted text, which web browsers will render in a monospaced font with exactly the same spacing as the original text, with no space squeezouts.
Write the filter program so that it scans every input line of text for <, >, and &, and replace all occurrences *which are not in <PRE> blocks* with the special HTML escape sequences <, >, and &, respectively.
The converter program, TXT2HTM4.RFD, is available for download:
Note that the converter program also strips out leading and trailing spaces and TAB characters, *except* in <PRE> text blocks.
Perhaps the best way to show how the automated conversion works is to give a "before" and "after" example. Here's a sample TWWMCA-format article in modified plaintext (i.e., with necessary blank lines and <PRE> and </PRE> tags added):
This Week With My Coleco ADAM 6211.03
by Richard F. Drushel (drushel@apk.net)
I. Administrivia.
This is a test article for the TWWMCA text-to-HTML converter program
that I wrote. This is only a test. Note the appearance of HTML reserved
characters <, >, and &, which must be trapped.
II. Sample Numbered List.
Frequently TWWMCA has numbered lists, such as:
(1) to list reasons why something will or won't work,
(2) to show steps in debugging something, or
(3) to give a list of programs or files. Note that the filter program
considers groups of single-spaced lines to be single entries in the
numbered list.
Note that the begin and end tags for ordered lists, <OL> and </OL>, must
be added manually to the filtered output file.
III. Sample Code Listing.
It's necessary to allow code listings to retain their original spacing.
Otherwise, it becomes completely unintelligible to humans (although the
assemblers and compilers can still deal with them).
<PRE>
;Z80 assembly code fragment
;on entry, B=number of table entry
;on exit, A=table entry value
START:
LD HL,ADDR1 ;point to base of table
LD DE,1 ;length of table entry
LOOP:
ADD HL,DE ;offset one entry
DJNZ LOOP ;keep going until we reach Bth entry
LD A,(HL) ;get the entry
RET ;all done
</PRE>
If the conversion works properly, it will look very nice.
IV. Next Time.
Who knows? Something that hopefully won't bore you to tears.
See you next week!
*Rich*
Now here's what it looks like after the text-to-HTML converter program gets through with it (lines have been wrapped to 80 columns):
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <HTML> <HEAD> <TITLE>This Week With My Coleco ADAM z</TITLE> </HEAD> <BODY> <H1 ALIGN=CENTER>This Week With My Coleco ADAM 6211.03</H1> <P ALIGN=CENTER>by <A HREF="mailto:drushel@apk.net">Richard F. Drushel <I>(drushel@apk.net)</I></A></P> <H2>I. Administrivia.</H2> <P>This is a test article for the TWWMCA text-to-HTML converter program that I wrote. This is only a test. Note the appearance of HTML reserved characters <, >, and &, which must be trapped. </P> <H2>II. Sample Numbered List.</H2> <P>Frequently TWWMCA has numbered lists, such as: </P> <LI><P>to list reasons why something will or won't work, </P></LI> <LI><P>to show steps in debugging something, or </P></LI> <LI><P>to give a list of programs or files. Note that the filter program considers groups of single-spaced lines to be single entries in the numbered list. </P></LI> <P>Note that the begin and end tags for ordered lists, <OL> and </OL>, must be added manually to the filtered output file. </P> <H2>III. Sample Code Listing.</H2> <P>It's necessary to allow code listings to retain their original spacing. Otherwise, it becomes completely unintelligible to humans (although the assemblers and compilers can still deal with them). </P><PRE> ;Z80 assembly code fragment ;on entry, B=number of table entry ;on exit, A=table entry value START: LD HL,ADDR1 ;point to base of table LD DE,1 ;length of table entry LOOP: ADD HL,DE ;offset one entry DJNZ LOOP ;keep going until we reach Bth entry LD A,(HL) ;get the entry RET ;all done </PRE><P>If the conversion works properly, it will look very nice. </P> <H2>IV. Next Time.</H2> <P>Who knows? Something that hopefully won't bore you to tears. </P> <P>See you next week! </P> <P>*Rich* </P> <HR> <P><A HREF="wk.html">Next Article</A><BR> <A HREF="wk.html">Previous Article</A><BR> <A HREF="index.html"><I>TWWMCA</I> Archive Main Page</A></P> <HR> </BODY> </HTML>
Now the manual tweaking begins:
the TWWMCA date must be added to the <TITLE> text (a "z" is left as a placeholder).
<OL> and </OL> tags must be added around the numbered list.
all occurrences of the string "TWWMCA" must be replaced with the HTML "<I>TWWMCA</I>" to make it italics (since it's a title).
the filenames for Next Article and Previous Article in the footer navigation bar must be added ("wk.html" is left as a placeholder).
any other desired text formatting must be added (e.g., <TT> tags (teletype text, monospaced) for EOS function call names like _READ_BLOCK).
When these changes are made, the final HTML will be rendered as seen below:
This Week With My Coleco ADAM 6211.03
by Richard F. Drushel (drushel@apk.net)
I. Administrivia.
This is a test article for the TWWMCA text-to-HTML converter program that I wrote. This is only a test. Note the appearance of HTML reserved characters <, >, and &, which must be trapped.
II. Sample Numbered List.
Frequently TWWMCA has numbered lists, such as:
to list reasons why something will or won't work,
to show steps in debugging something, or
to give a list of programs or files. Note that the filter program considers groups of single-spaced lines to be single entries in the numbered list.
Note that the begin and end tags for ordered lists, <OL> and </OL>, must be added manually to the filtered output file.
III. Sample Code Listing.
It's necessary to allow code listings to retain their original spacing. Otherwise, it becomes completely unintelligible to humans (although the assemblers and compilers can still deal with them).
;Z80 assembly code fragment ;on entry, B=number of table entry ;on exit, A=table entry value START: LD HL,ADDR1 ;point to base of table LD DE,1 ;length of table entry LOOP: ADD HL,DE ;offset one entry DJNZ LOOP ;keep going until we reach Bth entry LD A,(HL) ;get the entry RET ;all doneIf the conversion works properly, it will look very nice.
IV. Next Time.
Who knows? Something that hopefully won't bore you to tears.
See you next week!
*Rich*
Next Article
Previous Article
TWWMCA Archive Main Page
Since the TWWMCA articles have covered such a broad range of topics, some kind of content index is necessary, so that you can find articles on a particular topic. The easiest way to do this was to create a large, 2-column table in HTML, with links to the articles on the left, and brief topic summaries for each on the right. Of course, I had to read through all the articles again to create the topic summaries...
I have arranged the index in reverse chronological order, i.e., newest articles first. To add a new TWWMCA article, I just add another entry to the top of the table, being sure to update the *previous* article's navigator footer to point to the *new* article as the Next Article.
I'm sure that there are ways to do this using HTML <FRAME> elements, but frames are overkill for me :-)
Guy Bona tells me that he finally had a chance to try out my suggested patch for SmartFiler under ADAMserve (see TWWMCA 0008.20)...and it works for him! He can now read and write his database floppies. He has asked that I now look into a similar fix for Recipe Filer and Address Book, which (since they all use the SmartFiler database engine) are likely to have exactly the same bug as SmartFiler. This investigation will be the topic of future TWWMCA articles.
See you next week!
*Rich*
Next Article
Previous Article
TWWMCA Archive Main Page