Mon Sep 15 12:20:31 2014 UTC
This software is a compiler for Minimal BASIC as specified by the ECMA-55 Minimal BASIC (81KB, updated) standard from the ECMA International organization (formerly known as the European Computer Manufacturers Association). A version of the ECMA-55 Minimal BASIC (83KB, updated) better suited to printing (uses backspace underline sequences to more correctly match the original scanned input) is also available. Both text files are in UTF-8 encoding. Finally, the original source for this is the ECMA-55 Minimal BASIC PDF but that is not easily searchable and is relatively huge (15 MB) since it is a picture-based scan and not text-based.
The target is AMD64/EM64T/x86-64 machines running a modern Linux distribution. This compiler creates assembly language output files. These are then assembled into object files and linked to create an executable. The assembly dialect used is that of GNU gas from GNU binutils, since that will be available on any modern general purpose x86-64 Linux distribution. No libc or libm is used by the generated code, which allows creating very small executables. To keep the generated code small and simple, generated code output of SIN(), COS(), TAN(), ATAN(), EXP(), POW, LOG(), RND, and RANDOMIZE is only emitted of those features are required by the BASIC program.
Version 1.6 has been released!
This includes bug fixes and a rewrite of the expression code generated to use assembly macros to make the code easier to read. Also, parseinput.c has comments now. The uninitialized variable detection for scalar numeric values is working again. There was an alignment issue, so sometimes if you were lucky it worked. Now it should work every time. See the NEWS file for more details.
You can download my presentation slides from my talk entitled ResurrectMinimalBASIC in PDF format.
You can try Version 1.6 today. If you are OK with mercurial then you can get a read-only copy of the repository with this command:
hg clone http://hg.code.sf.net/p/buraphakit/MinimalBASIC/
Emmanuel Roche from France kindly provided paper copies of the missing NBS test sources. All tests are now typed in and are part of the self-test 'check' and 'check32' targets. And yes, the new (to me) tests did find problems.
The complete status of the compiler's ability to pass the NBS tests can be seen in README.NBS. The complete status of the compiler's ability to pass the new HAM tests can be seen in README.HAM.
The license for the compiler itself is the GNU General Public License version 2 only, see the file included file COPYING for details. Parts of the runtime support code have difference licenses, but all are open source and free to use.
This author of the actual compiler software is John Gatewood Ham, and that code is available under the GPL version 2 license only.
The included runtime library assembly routines for SIN(), COS(), TAN(), ATAN(), LOG(), EXP(), and POW (used by ^ operator) are public domain from SLEEF-2.80 (tweaked), from Naoki Shibata.
The included runtime library assembly routines for RND, and RANDOMIZE are public domain from ISAAC-64 (tweaked), from Bob Jenkins.
The included runtime library assembly routines for floating point output are derived from David M. Gay's dtoa and g_fmt routines. That code is copyrighted by Lucent Technologies, but is open source and freely redistributable.
The file dumpregs.s is not part of the compiler, but is optionally used for debugging. John Gatewood Ham wrote dumpregs.s and places that particular file into the public domain.
The ECMA-55 standard was chosen over the "ANSI X3.60-1978 minimal BASIC" standard since it is free. ANSI, despite cancelling the standard, still keeps the 35 year old standard locked down and available only if you pay for it, which is a quite mean-spirited attempt to prevent any compliant free and open source implementations from being written. The same attitude exists with ISO for the "ISO 6373:1984 Data processing -- Programming languages -- Minimal BASIC" standard. This standard has many other names, such as "AS 2797-1985 Programming language - Minimal BASIC", and the only free one is ECMA-55, since all the other standards bodies are trying to kill BASIC forever.
Many products exist today that call themselves BASIC, but if it doesn't have line numbers, it's not really BASIC. If it has multi-line statements, it's not really BASIC. The modern un-BASICs have nice features, but they are not really BASIC at all, but instead seem to be more Pascal-like in nature. The nicest thing I can say is that they are derivatives of BASIC.
I could not find any Minimal BASIC implementation at all for modern machines. Few true compilers (compilers to bytecode do not count) for any BASIC with line numbers exist, and most of them do not compile to assembly. None of them (that I could find) were both FOSS and ready for 64bit. Thus this project was undertaken to fill the need for a FOSS Minimal BASIC compiler for modern 64bit Linux machines.
Well, obviously if you need a Minimal BASIC implementation on Linux this will be useful. However, I suspect mostly it will show how to generate some simple but effective code for 64bit Linux on AMD64/EM64T with a small enough system that a normal programmer can learn it and understand it. Projects like gcc and llvm are huge and learning how to modify those compilers is so hard most people just give up. This project is much simpler, which no need to learn LISP, any intermediate languages, etc. You just need to know C99 and AMD64 assembly language in the GNU gas dialect. That's still a lot, but it is a tiny fraction of what is required for production compilers.
With production compilers, especially ones with multiple front and back ends, it is easy to get overwhelmed by the amount of information required to understand it. With this Minimal BASIC compiler, that problem does not exist. Many compilers need many dependencies (cmake, or the GNU autotools, or flex/bison, etc.) but this compiler just needs make, gcc (or clang), and binutils to build. Most compilers today emit code that requires a dynamic linker. Even what gcc calls static is actually dynamic in most cases, since GNU libc wants to be able to dynamically load code to resolve things for NSS. Here the generated code doesn't link with glibc so this problem is avoided.
This project can serve as an example of how to produce programs that don't link to glibc but instead call the kernel directly. Most assembly examples for the GNU gcc that I could find are embedded asm in C or C++ programs; very little stand-alone example code exists (that I could find) for GNU as dialect beyond a few simple hello world programs. There were no good examples of dealing with floating point exceptions that I could find at the assembly level for GNU as either.
Now is this generated code great? No, there is a lot of room for improvement. I've aimed for correctness and simplicity, and have avoided all optimization for now. Exception handling for math operations is exceptionally ugly, but even the ugly code took me a long time to get working. Constructive feedback with examples of doing it in a way that is cleaner, but still gives the same results and doesn't require any external libraries, would be welcome.
My work machine while developing this compiler from August 2013 through August 2104 was a Core 2 Duo E4700 (Conroe), and my request for something newer was placed on hold. Since the processor I had available did not support those instructions, the original design didn't use them.
I am very pleased to report that my employer, Faculty of Informatics at Burapha University, provided me with a new Haswell refresh machine in September 2014. On September 13, 2014, I completed adding support for SSE4.1's PINSR[QD] & PINEXTR[QD] and now they are used when you supply the -4 option. The compiler still generates code for a Conroe by default, but when the -4 option is supplied the nicer SSE4.1 instruction sequences are used. Surprisingly, when testing on the i7-4790 the run time is essentially unchanged. However, reading the code is much nicer with the new instructions. The upcoming 1.7 version will include this work, but for now you have to pull from mercurial to get those changes.
I have already started planning on adding support for AVX, but will maintain the ability to run on ancient Conroe hardware for people who need that ability. As of Septeber 14, 2014, AVX detection is working (use the -A option) and it automatically also includes the SSE4.1 support since AVX is a superset of SSE4.X, but no AVX code (beyond the detection) is actually generated yet.
The overall structure is best described by the structure diagram. This compiler is simple and generates the assembly directly from the parse without using any explicit syntax tree or intermediate representation. While a peephole optimization pass over the assembly itself could be done, that has not been implemented since optimization was not a goal for this implementation.Here is a list of the files that make up the compiler and their purpose.
There is an included
script that you can modify that automates the compilation process. If you
want 32bit math for add, subtract, multiply, and divide you can use
instead. If you want wide output with more significant digits and support for larger exponents,
you can use
instead. You can optionally strip the generated executable with the
strip command and it will still work, but be slightly smaller.
If you want to know the low-level details, please keep reading. The next section describes how to use the compiler directly without any scripts.
vi WHATEVER.BASYou must use a text editor that creates 7bit ASCII files. Remember, this dialect of BASIC requires line numbers and has 72 columns lines. String variables are limited to 18 bytes.
ecma55 WHATEVER.BASIf you have trouble, try again with -v option and redirect the output to a file to get very verbose diagnostic information and read it like this:
ecma55 -v WHATEVER.BAS 2>&1 | tee WHATEVER.BAS.LOG | lessUse 'q' to exit the less output viewer. Another advantage of using the '-v' option if you have trouble is that it will leave the partially generated assembly file for you to examine.
as -n --divide --64 --size-check=error WHATEVER.BAS -o WHATEVER.oYou will get warnings about jumps - while I can specify a .d32 suffix to instructions, the warn in the assembler fires anyway. However, the code is generated correctly. If you decide to modify the assembly code before assembling, please remember this is AT&T syntax and both the opcodes and the order of operands are different from Intel syntax. Also, the compiler emits code for the large memory model, and the code is not RIP-relative (PIC). This is not what most compilers generate by default for x86-64; it is much simpler, but may run a bit more slowly.
ld -nostdlib -z defs -z nodefaultlib -z nodlopen -z noexecstack -Bstatic \ --no-omagic -m elf_x86_64 -o WHATEVER WHATEVER.oI do not support dynamic linking, and no system libraries are used, so this long line will carefully link the object file to create an executable that is able to run on 64bit Linux on AMD64 compatible processors (with SSE2 support, which is all of them I think) without any system libraries. If you are using dumpregs, you'll need to append dumpregs.o to the end of the ld command.