Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Explaining my fast 6502 code generator (pubby.games)
217 points by pubby on March 24, 2023 | hide | past | favorite | 49 comments


I may have misunderstood, but I believe step 1 (eliding loads) is simply a cache scheduling problem. The optimal solution is the greedy "furthest in the future" eviction policy.


That's an excellent point. I hadn't heart of furthest in the future, but it looks like it does solve step 1. Past that though, it doubt it can be used because each of the 6502's registers are different and don't support the same operations. It's a good idea though, and might work for some specific RISC architecture where all registers behave the same.


Is [1] a good way to learn about furthest in the future eviction?

[1]: https://blog.henrypoon.com/blog/2014/02/02/proof-of-the-fart...


I found that article quite confusing. I think these slides are clearer: https://courses.cs.washington.edu/courses/cse421/18au/lectur...

(provided you know about induction already.)


> my compiler generates faster code than GCC, LLVM, and every other compiler I compared it to

GCC and LLVM can target the 6502?!?


For LLVM, I don’t think it’s in the official repo, but yes:

https://github.com/llvm-mos/llvm-mos


I think it's possible LLVM-MOS gets into mainline LLVM eventually. The devs have done a good job getting all the kinks ironed out.

BTW, mysterymath, one of the most prominent LLVM-MOS contributors, left this comment on their current code gen. (I'm posting it here for visibility):

"In LLVM-MOS, we mainly struggle with its register allocator; the rest of the backend is really quite reasonable. The "Greedy register allocator" in LLVM is a just finely tuned priority allocator with nice live-range splitting. It works great for zero page cache locations, but it's just not tuned very well for tight register classes like those involving the processor's three architectural registers. I've half a mind to implement Hack's SSA-based register allocator in LLVM to use on A, X, and Y; this would clean up the oodles of spurious copies LLVM-MOS spits out in the worst cases."


GCC: Via an old, unofficial port: https://github.com/itszor/gcc-6502


I'd like to see this same technique applied to x86, and what the performance is like without the "illegal instructions" (omitting them from generation would probably be trivial). It's relatively well known that one of the ways Asm programmers can beat compilers is on instruction selection, and that's what this technique seems to excel at.


This does far more than I did back in my PET and then Amiga days, but one thing I did was write a multi-pass compiler. Each pass at first found ways to make the come better (usually smaller so it ran faster). Even simple code I wrote could see a 10-20% improvement.

Of-course, this is because the original code was quick and dirty. I wonder what improvement modern compilers could have added.


Geez, 25 years ago I was being boggled by how sophisticated was the code generated by the C compiler we shipped at SGI. For the MIPS architecture, as for most contemporary architectures, re-ordering memory accesses to minimize cache misses was FAR more important to execution speed than the specific selection of machine instructions.

Of course for early MIPS there were little things like, don't put a jump in the last word of a 4K page... :-(


6502 also had some "little things". E.g., it's jump indirect didn't work across page boundaries: http://www.6502.org/tutorials/6502opcodes.html#JMP


Excellent read. I am a pro compiler developer and I learned quite a few things from the article.


This is a Massalin superoptimizer.


From a quick skim of Massalin's paper, they seem similar in how they generate combinations of instructions and prune, but different in other areas. Superoptimizer spews out every combination of instruction (even invalid ones) in a search for true optimality, and uses boolean logic + emulation to determine equivalent code sequences to prune. 6502 algorithm only generates combinations it knows will work, and uses a symbolic approach to determine equivalence. Its results are not always optimal, and portions like the PBQP stuff is obviously different.


I think you'll be (very!) interested in e-graphs:

https://en.wikipedia.org/wiki/E-graph

Google fodder:

"Equality Saturation: A New Approach to Optimization"

"Denali: A Goal-directed Superoptimizer"

"Z3: An Efficient SMT Solver"

"Efficient E-matching for SMT Solvers"

"egg: Fast and Extensible Equality Saturation"

"Rewrite Rule Inference Using Equality Saturation"


It also only uses one possible ordering for the IR instructions?

Still a nice result. It looks like a dynamic programming solution to code generation.

Next to the game boy. ;-)


It’s probably possible to incorporate all possible orderings of instructions, if the set of already computed instructions is part of the state of the dynamic programming algorithm.


Does this code generator spill to zero-page? Do you have to do anything special for allocations there?


Since it combines register allocation with instruction spilling is innate - there isn't a special spilling phase. Spilled variables are assigned ram addresses after code generation occurs at link time, with the most frequently used going to zeropage.


Does it avoid ZP addresses with special purpose for the system when doing this?


Yeah it can reserve specific ZP addresses. I do this to implement the runtime. It can also be used to reserve hardware registers, but since my only target is the NES, I don't have to worry about that.


All the other compilers in the comparison are C compilers, right? Whereas this compiler is compiling its own home made language? So not sure how the comparison can be valid.


FWIW the custom language is very close to C, and the examples are pretty much a 1-1 transposition.

I agree with you though on a different note. It's dubious to compare compilers by benchmarking them, because tests are highly arbitrary and are won/lost based on single weak links. It's not really an exact science, but rather something you can start with to figure out how things are behaving. I mostly base my opinions by looking at the assembly code each compiler generates, but a single bar graph is a better presentation for articles.


It could be fun to compare with Action!--it's been known to compile a reasonably good 6502 assembly (considerably better than other compiled languages for that platform), https://en.wikipedia.org/wiki/Action!_(programming_language),

It has been released as open-source together with the binaries, https://atariwiki.org/wiki/Wiki.jsp?page=Action

You should be able to run these using an Atari 8-bit (XL/XE model) emulator, like Atari800, https://github.com/atari800/atari800/releases or Altirra, https://www.virtualdub.org/altirra.html


I think NESFab is a medium level programming language targeting the NES 6502. On NES, the Decimal flag has no effect.

GNU Compiler Collection no longer targets 6502, but an older version of the GNU C Compiler could. LLVM targeting 6502 is limited. There is also the cc65 cross compiler. [1]

LLVM [2] and GCC [3] are compiler and toolchain technologies. The name Low Level Virtual Machine (LLVM) is no longer officially an acronym, and the GNU C Compiler is now the GNU Compiler Collection (GCC).

[1] https://www.cc65.org/ [2] https://www.llvm.org/ [3] https://gcc.gnu.org/


No, not really. LLVM is not a C compiler, it consumes an intermediate representation. Clang is the C compiler which produces LLVM IR.

Likewise, GCC contains a C compiler but the machine specific parts like the codegen take an intermediate representation (GCC has several) as input.

Almost no compilers out there compile directly from C to machine code, there is one or more intermediate forms in between. C is not a good input language for an optimizing compiler.

It's a good and entirely valid comparison. It's the backend codegen that is being compared, the language frontend does not really play a part in it.


Can this make games for NES?


Bookmarked this to read about optimizers, because it looks great.

That said, I clicked on the link because it had "6502" in the title. And... this isn't very interesting as a retrocomputing activity. To be blunt: there's absolutely no way in hell a compiler architecture like that is ever going to be self-hosting in 64k of memory space.


I don't think of retro-computing as an activity limited solely to the hardware and software that existed at a given time, that's far too restrictive for me.

6502 is an architecture, not a specific machine, nor even the set of machines that supported that architecture in the past. The architecture is and remains interesting on its own! (6502 assembly remains a really good intro to assembly, for instance, even for people that have never even seen a C64.)

There is absolutely scope for people to do fun things with 6502 that could never run on a C64 or an Apple II. The vast majority of 6502 code today runs in emulation, so there's really no reason to artificially gimp it to ape the limitations of any given machine. (Unless emulating that specific machine of course!)


> The vast majority of 6502 code today runs in emulation

Emulating a 6502 isn’t that popular, so _if_ there’s even a single commercial product still using a 6502, I would guess that could easily run more 6502 cycles in total than are getting emulated.

Now, are there any? I wouldn’t know, but

- it seems you can still buy new (they’re RoHS compliant, so I don’t think they have stocks going back decades that they’re selling) 6502 CPUs (https://www.westerndesigncenter.com/wdc/HowToOrder.php)

- It was true during the Tamagotchi craze (https://hackaday.com/2013/05/24/tamagotchi-rom-dump-and-reve...: “it was a huge chore just to figure out what processor this uses. It turned out to be a 6502 core with a few other things built in”), but that’s over 20 years ago.

So, where, if anywhere, are these things being used?


> 6502 assembly remains a really good intro to assembly

It... really is not, though. It doesn't teach you macro assemblers as they exist in the modern world. It doesn't teach you interaction with the linker except in the simplest ways.

And while the instruction set is "simple" in the sense that it can be understood on a page of paper, lots of critically important ideas don't exist in a meaningful way. Modern techniques like register assignment aren't possible given the limitations and non-orthogonality of the CPU state. There's no "ABI" equivalent you can use to call a standard function, and even something as foundational as "passing arguments" to a function isn't directly expressible, because the "stack pointer" isn't actually a pointer. In point (heh) of fact, no pointers are pointers because the 6502 doesn't actually have an abstraction for a memory address!

Everyone should learn to hack on an Apple II (and good grief, not a Commodore unless you are trying to learn VIC-II or SID hardware), but not to teach themselves "assembly" as a skill they might apply elsewhere.

In fact it's almost the same argument used for why we teach new kids Python and not BASIC. The latter just isn't a useful tool for expressing the concepts you need to learn.


I was almost with you, because much as I love the 6502, it is kind of weird by modern "standards". But then: the Apple II? Like, wtf man. C'mon. The Apple II is shit. Is it just because you like that sweet double density disk throughput? Well, fair enough, but then why not go for the BBC Micro. 2 MHz, plus 80 columns. OK, so it only has 32 KB RAM... meanwhile, you are still American.

In summary, please tell us how many shades of mucky greyish brown you get with the Apple II. Because with the C64, you get 13.


> Well, fair enough, but then why not go for the BBC Micro

Because the Apple shipped four years earlier? The question isn't what machine is better. Obviously later hardware builds on the advances of its ancestors. The idea behind valuing history is understanding where the changes happened.

Acorn (and Commodore) shipped an excellent machine given the constraints of the time. Both devices were worth purchasing.

Apple shipped an absolutely groundbreaking work of genius that no one had forseen and that no one would duplicate for years. AAA games were shipping in 1988 (c.f. Ultima V and Prince of Persia) designed directly to a framebuffer design architected in the spring of 1977.

If you can't see the difference there and perceive which device is more important to understand and study, I don't know what to say.


You can get a Beeb with 64k. The Master even comes with 128k standard, if I’m not mistaken.


> It... really is not, though. It doesn't teach you macro assemblers as they exist in the modern world. It doesn't teach you interaction with the linker except in the simplest ways.

What does the processor/instruction set have to do with the available tooling?!

FYI back in the day much of Acorn's (BBC Micro = 6502) software was written using a very capable macro assembler (source: I worked there).


Yes! I learned 6502 assembly from the Apple ][ system manual. You got the op code tables, the schematics and Woz‘s monitor incl. listing and a disassembler. It motivated me to learn English, forced me to carefully calculate relative jumps and taught me to think first then code. Two of the three still useful skills.


Wow you really are into telling people what they should do.


> Wow you really are into telling people what they should do.

Saying "6502 is not a good platform for learning assembly" is simply not the same thing as saying "you should not learn assembly language with a 6502 assembler", and I don't understand how you're interpreting it that way.

It's just giving you my opinion and advice. By your own logic, you are now trying to censor my ability to give that opinion and advice. That's bad too, right? Why are you "telling me what to do"?

Chill. Opinions are OK. If you don't agree, that's fine. But a better response is a reasoned counter argument and not "don't say that".


If you were being honest you would say wasm is a good platform for learning assembly and JavaScript the perfect first programming language.


WASM isn’t really even an assembly language, despite the name.


Not interesting to you, but other people may differ! A cross-compiler targeting the 6502 is still interesting to some. As the article notes, these compiler techniques weren’t invented in the 70s because computers weren’t powerful enough to make good use of them. What’s wrong with exploring what the original hardware is capable of when freed from those restrictions?


It is my understanding the the GEOS system was developed on a more powerful machine than the C64/Atari computers it was used on. This let then have the entire code in memory and then processed for common functions. They could not do this on the the 64K-8 bit computers at the time.


There's nothing "wrong" with it. I'm just saying that if I'm going to read an article about fun 6502 activities, I want to see it hosted on an Apple II. If I want to read about compiler techniques, I'd prefer to see it targetting a more orthogonal (or less weird, anyway) architecture.


Okay, here's an article about calculating e to 116,000 digits on an Apple ][: <https://archive.org/download/Apple_2_Woz_e_Calc_1981/Apple_2...> (warning, PDF!). I found it interesting to read, even though I never owned an Apple ][, nor did I ever own a 6502-based computer (well, except for the Atari 2600). I was, however, able to apply the techniques to computers I did own (6809 and 8088 based).


Who in their right mind would self host 6502 development?


Plenty of teenagers in the 80s writing games for the various home computers at the time (Apple ][, Atari 400 & 800, Commodore 64).


That waa me. My 6502 "IDE" was a BBC Micro, with it's excellent BBC BASIC ROM that had inline 6502 assembler built-in to the BASIC language.

It was so good and accessible, and documented in the Advanced User Guide, that after learning BASIC starting age 10, I learned 6502 a few months later and was able to reverse engineer and modify other people's games not long after that.

I had the luxury of floppy drives which my contemporaries did not have

The floppies helped immensely with self-hosted development of larger programs. Imagine losing all your assembler work due to a crash running it because it was tempting to skip rhe time of several minutes saving to tape. Floppies were reasonably fast for saving before running things, but few people I knew had them.

It also gained a Z80 second processor (an add-on; and later a 68000), so rhe Z80 helped with development of 6502 code, as a RAM disk and editing scratchpad that survived 6502 crashes. And for running Wordstar, which was a decent editor.

Eventually I had someone else's ZX Spectrum hooked up to the BBC with a kind of home-made bit-banging serial port. That's when things started getting fancy, as my BBC's Z80 second processor's flavour of BBC BASIC could assemble Z80 code for the Spectrum, and run some Z80 binaries from other people's Spectrum software, with the BBC's 6502 providing display and sound emulation.


Yup, that was me too. The BBC Micro was a great 6502 dev environment (for the time).

Several years after I moved on from the Beep to a PC, I again had the opportunity to use my knowledge of 6502 assembly. A rally-driving friend asked me to look into changing the fuel curves in their Toyota's ECU. Turns out that many early 90s ECUs used 6502s. I whipped up a 6502 disassembler (in Turbo Pascal) and was able to reverse engineer the ROM dump and make the required adjustments. Fun times.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: