Title: Export source code in any language in HTML
Question: It was asked for an easy way to export all types of source code to HTML.
Answer:
First of all I want to mention that I know that someone has already posted solution to this problem, but I tried to use a different approach.
The scope in my mind is to gain the maximun flexibility from the tool, so the must is to write my own parser.
Just a little background to let you understand what's on my mind.
When you talk about sintax highlighting the concept is to split the language "tags", where for "tag" i mean a word, a symbol, a number, in "categories" and obviously any "category" is rendered with its own font!
Now focus on "tags", different languages have different tags so the "begin" tag in Delphi is "{" in C-like languages. There are mainly three types of tags:
- Simple tags, such as a keyword
- Enclosing tags, such as comments defined with "{","}"
- Line tags, such as comments defined with "//" that close automatically at the end of the line.
The first step I done was to encapsulate these concepts in classes, so I defined three classes:
- TCodeDelimiter: that's for defining Enclosing tags and Line tags
- TCodeCategory: that's practically a collection of TCodeDelimiter or a collection of Simple tags (a TStringList) with the relative font definition
- TCodeLanguage: that's a collection of TCodeCategory with the default font for the language.
Now the second step, what's about the parser?
I splitted the problem in two phases, first I have to identify the tag I'm searching for, this is done by a simple loop, second check if the found tag is defined to be highlight!
In a source code you can meet four kinds of characters a letter, a number, a
symbol or a space, using this grouping method the loop try to identify the tag.
The loop identifies continuos group of charcters of the same kind as tags.
An example: TMyClass.MyMethod
Tag1: "TMyClass" all characters of the same kind (letters)
Tag2: "." (symbol)
Tag3: "MyMethod" all characters of the same kind (letters)
To check if a tag need to be highlighted or not I used a tree where each character is a node has as leafs all the possible characters that can follow.
So the two keywords (or tags) "inherited" and "interface" generate the following tree:
I
|
N
/ \
H T
| |
E E
| |
R R
| |
I F
| |
T A
| |
E C
| |
D E
Note that a tag can be nested in another one, for example "in" is nested in "INherited", to resolve these problem each node has a boolean property that define if a node can be final for a tag or not, so a tag need to be highlighted only if, naviganting trought the tree, the last node found is a "final" node.
Now, why I do this way? Easy, once you are able to determinate if a given tag is a wanted one you can do anything you want....render it in HTML it's easy.
Now you have a routine that you can reuse anytime you have to deal with similar problems.
I hope all this may be useful for you.