Porting the Go compiler from C to Go

beyang · on April 25, 2014

Author of the post here. Happy to answer any questions I can, and FYI, we (Sourcegraph) are liveblogging all of GopherCon at http://gophercon.sourcegraph.com. Let us know if you have any questions or find it useful!

codezero · on April 25, 2014

You might want to clarify that you aren't Russ Cox and that this post is a summary of a presentation he gave.

sqs · on April 25, 2014

Thanks for mentioning this. I initially included "(unofficial liveblog post)" in the HN title, but it apparently was edited out by a moderator. So, I added a clarification to the top of the post itself.

nfoz · on April 25, 2014

Are you at all concerned about the possibility of the Trusting Trust[1] problem manifesting in any of your Go compilers?

[1] http://cm.bell-labs.com/who/ken/trust.html

0xdeadbeefbabe · on April 25, 2014

Since this is an exercise divorced from reality, the usual vehicle was FORTRAN.

Dang it I cheated and looked up Rus's quine. For anyone not wanting to cheat a good hint would be "tail recursion"

zellyn · on April 26, 2014

It was suggested long ago that having multiple compilers would make it incredibly difficult to play Trusting Trust games.

Shamanmuni · on April 25, 2014

It's great that they are aiming for an automated conversion from C to Go. It's clear they aspire to convert their code which was written in a certain way. But I think it would be a huge boost in Go usage if they could eventually aim to transpile any C code into Go code.

A little dream of mine would be if in the future when Rust is stable Mozilla developed a transpiler from C++ to Rust. That would be brilliant.

By the way, all the other talks at GopherCon seem pretty interesting, I hope someone uploads videos of them soon.

jerf · on April 25, 2014

It is unclear how one would compile arbitrary C code into useful Go. The stereotyped conventions of well-written compiler code allows for a more idiomatic translation than a general translator could ever aspire to.

C++ to Rust would be even crazier.

(Transpile is a silly word. It's "compile". Compiling already trans-es.)

Shamanmuni · on April 25, 2014

I know it's crazy difficult to do it, that's why I was talking about dreams.

But maybe a set of guidelines about translatable C code plus a translator which is good enough for a variety of cases would make refactoring the resulting code such a manageable task that many projects could consider switching to Go directly.

You are technically correct in that compile already implicates a translation, but we usually use that term refering to a translation into a lower level language and not another language at the same level. You can say transpiler or source-to-source compiler for those cases and I think it's clearer and more accurate for the reader.

yohanatan · on April 25, 2014

> ... but we usually use that term refering to a translation into a lower level language and not another language at the same level.

Rust is not lower-level than C++ and neither is Go lower-level than C.

mseepgood · on April 25, 2014

Exactly. Hence "transpiler".

yohanatan · on April 26, 2014

What part of his definition didn't you understand? He said a transpiler trasnlates from a higher-level language to a lower-level language. We are talking about translating from C -> Go and C++ -> Rust both of which are in the opposite direction (i.e., low -> high).

mseepgood · on April 26, 2014

> He said a transpiler trasnlates from a higher-level language to a lower-level language

No, he said a compiler translates from a higher-level language to a lower-level language:

"You are technically correct in that compile already implicates a translation, but we usually use that term [= compile] refering to a translation into a lower level language"

yohanatan · on April 28, 2014

Ahh, my bad.

yohanatan · on April 26, 2014

This should not have been down-voted. We are talking about translation from C -> Go and C++ -> Rust (which are both in the opposite direction of the definition given for 'transpiling').

vanderZwan · on April 26, 2014

I agree that your comment added value to the discussion, and gave a counter-vote.

However, arguably the term could make more sense there, if we ignore the earlier definition and instead assume "trans" is short for "transcendent", as in "climbing to a higher level".

If we then simplify that to "compile from one language to another of roughly equal or higher level", it becomes a useful word to indicate a specific subset of ways one can do compilation.

Of course, I'm completely pulling this out of my ass and I might piss off actual CompSci people who use very specific and strict definitions in their papers (kind of like how some get annoyed with the procedure/function mixup), so take this with a grain of salt.

ldrndll · on April 26, 2014

> You are technically correct in that compile already implicates a translation, but we usually use that term [ie compile] refering to a translation into a lower level language and not another language at the same level.

The way I understood what he wrote, he was making the same point as you. Presumably the people downvoting you had a similar interpretation of what he wrote.

stcredzero · on April 25, 2014

Good automated translations involve matching and translating the project idioms. This results in code that fits the target language's language-idioms. Automated translation only at the level of language constructs results in odd looking generated code.

mwsherman · on April 25, 2014

The use case is to bootstrap the conversion from C to Go, if one has made a decision to do so. You pick a cutoff time, and say “now were going to do our work in Go”.

What comes out the other end, as Russ phrases it, is a C program in Go syntax. The next phase of work is to turn it into “Go” as a human might write it.

The translated program is not meant for deployment, really. It’s to give the humans a starting point.

kodablah · on April 25, 2014

"a transpiler from C++ to Rust"

I think the best approach for something like this is a disassembler for LLVM IR. I am not extremely familiar with the LLVM IR, but I figure there are common patterns or debug symbols clang produces that could let you get a decent idea of what the higher level gave you. If not targeted to clang output too much, you can then transpile any LLVM-able language to Rust.

I think, but am not sure, that the problem with Rust being a target for these other languages (my pipe dream is to write a JVM in it w/ a JIT using the bootstrapped rustc lib) is that you have to use unsafe code all the time (unless there's 0 performance penalty for ARC) because the original code is not written with ownership/borrowing semantics.

sanderjd · on April 25, 2014

I think there are probably lots of chunks of code that would be perfectly valid under Rust's ownership semantics and could therefore be transpiled safely. Figuring out which chunks are and aren't would probably be quite difficult though.

wtetzner · on April 26, 2014

One approach might be to just convert to Rust, and let the Rust compiler tell you what things don't type check.

Someone · on April 26, 2014

Reading https://docs.google.com/document/d/1P3BLR31VA8cvLJLfMibSuTdw..., I don't see that they are aiming for an automated conversion; they are aiming for an automation-assisted conversion.

They want to go quite a step up from doing some global regex replaces, but in the end, step 3 has "cleaning up and documenting the code, and adding unit tests as appropriate.". I don't think they aim to automate anything there.

natefinch · on April 25, 2014

It has been stated that there will be videos available, but I'd not expect them until next week sometime (when the organizers get home and have recuperated).

stcredzero · on April 25, 2014

Automated rewrite FTW! This can help you avoid freezing a project while it is being ported. Also, if you have a code base with its own idioms, then those idioms can be matched and translated, which can produce cleaner target code.

adient · on April 25, 2014

Go 1.0 spec is already frozen and will not change any time soon.

szabba · on April 25, 2014

I believe he referred mostyl to freezing the efforts on impoving the implementation.

stcredzero · on April 26, 2014

I was also talking about projects and porting in general. Incidentally, I've been paid to do exactly what we're discussing. (Porting a production program using automated translation.) It works.

adient · on April 25, 2014

Implementing the compiler in Go becomes the effort to improve the implementation. Obvious, no?

SixSigma · on April 25, 2014

Russ' paper on the subject from Dec 2013

https://docs.google.com/document/d/1P3BLR31VA8cvLJLfMibSuTdw...

micro_cam · on April 25, 2014

I'm reminded of the fortran to c compiler f2c that was used to produce pure c implementations of lots of libraries like LAPACK.

I'm curious how they will handle things like pointer arithmetic and memory safety in C vs Go. If they mange to do so in a performant way I could see translating lots of numerical or computationally intensive code to Go so that it could be run in a shared cloud environment without worries about memory safety and without having to resort to vms for separation.

mjcohen · on April 26, 2014

If you took arbitrary fortran, especially with i/o, the results were an unreadable mess - but they compiled and ran.

For a project I had (in the early 90s iirc), I had to extensively modify the fortran and make multiple versions before the generated C was readable. Still, much easier than a rewrite.

thinkpad20 · on April 25, 2014

> There are currently 1032 goto statements in the Go compiler jumping to 241 labels.

Wow, that's really striking. I know that goto statements have their uses but for something written in the last couple of years to have over a thousand of them is very surprising (it might not be surprising at all for those who write C code all the time). I guess they're mostly just for error handlers?

awda · on April 25, 2014

As others have pointed out, this is actually idiomatic C. You have a portion of your function below a label which is "clean up everything and return this variable (usually error status)." Then at any point where you have an error, you goto that label and everything is released and the error gets returned. It's sort of a manual 'defer' pattern.

"goto considered harmful" is taken out of context and bemoaned mostly by people unfamiliar with idiomatic C, I think. (Edit: sorry, I mean just for C. For other languages, e.g. Go, there may be more appropriate patterns like the 'defer' keyword.)

daeken · on April 25, 2014

I'm not surprised to see them in a compiler. When you're parsing and want to treat it as a big state machine, gotos are your best friend.

clarry · on April 25, 2014

Encoding state implicitly in control flow tends to give you nicer and more natural code. A big state machine implemented with gotos can be tough to read if it's also a parser (which may need to build a tree, recurse, etc.). A small hand-rolled lexer might be reasonable to implement with goto. Although I would prefer to enumerate the states and use a switch.

If you're writing a recursive descent parser for C while keeping minimal lookahead, I can think of a place or two where goto might really help. One is to deal with casts, compound literals and subexpressions. They all begin with a left-parenthesis, so you can't tell what you're dealing with until a bit further. Since these all appear in the parsing of simple expressions (identifiers, constants, unary operators), the code can go in one function and you can jump to the right place with goto as soon as you know what you are dealing with.

I did not look in the Go code but C programmers mostly use goto for error paths.

bakul · on April 25, 2014

Except in languages with tail recursion such as Scheme! A state ends with a call to the next state instead of a goto next state.

jgrahamc · on April 25, 2014

For example: http://golang.org/src/cmd/6g/gsubr.c

Some are for single error return, but others are just to make a nice structure for the code.

saucetenuto · on April 25, 2014

I dunno, a few of them seem hard to justify. The one on line 475 could just be an "else" branch, frex, and the ones in the loop on 327 could as easily be continues.

__david__ · on April 25, 2014

Only one goto in the loop could be a "continue", the one in the nested loop cannot. The one that could be is most likely a goto just for consistency.

mjibson · on April 26, 2014

When the full video of the talk comes out, you will see that Russ agrees with you. He lists the types of uses of goto. Various are used to make loops or elses, and he says those should not have occurred.

simias · on April 25, 2014

I would be curious to hear the justification behind the implementation of angryrealloc, the "goto ok" is strictly equivalent to a "continue" as far as I know. Maybe it triggers an optimization with a certain compiler.

chc · on April 25, 2014

It's to continue the outer loop from the inner loop.

raverbashing · on April 25, 2014

Ah the old C style of function types in a different line.

This smells of very old code (as the header testifies)

awda · on April 25, 2014

??? This is modern BSD (and probably plan9) style. Here's a line from the header:

That's pretty recent.

And here's BSD style(9): http://www.freebsd.org/cgi/man.cgi?query=style&sektion=9

    The function type should be on a line by itself preceding the function.

rootbear · on April 25, 2014

I use that style all the time. I like having the function name start in column 1, it can make searching easier.

I do wish that in C1x (for x > 1), they could find a way to let us declare multiple formal arguments of the same type without repeating the type:

    float
    my_graphics_hack(float x, y, z, r, g, b, u, v) { ...

instead of

    float my_graphics_hack(float x, float y, float z, float r, float g, float b, float u, float v) { ...

which gets really tedious and obfuscates that fact that they are all the same type, especially with the typename is more complex than just "float". When the new syntax was added to ANSI C, it wasn't possible to do multiple variables, for reasons I've forgotten but which I think had to do with forward type references. It would be awfully nice to find a fix for this.

simias · on April 25, 2014

That's a poor justification, there are plenty of tools to index and search C(++) code like ctags or cscope, and your example could use a vector or a struct, I have a hard time imagining when a function with such a prototype would be useful in real life.

clarry · on April 25, 2014

Compound literals are not too pretty and not too well known. Plus, if you care about c89... So, do you wrap all your values in a struct because the prototype gets too long otherwise? What a chore.

Preferring that function names start at the first column to make searching easier is a perfectly good justification.

You, on the other hand, seemingly would impose your tools (which have their shortcomings) and workflow on people with no justification at all.

Just because a tool exists, doesn't mean it's good (or better than what people are accustomed to) or that everyone must use it.

Do you think everyone should use Vim on Linux too?

simias · on April 25, 2014

No, I think everyone should use Emacs on FreeBSD, but that's not my point.

Adding syntactic sugar to a language makes the language bigger and harder to completely understand, it makes it easier to misuse a feature (and C already has a lot of trickery with types, like when you declare a parameter as an array but it behaves like a pointer). I'm a strong believer that implicit is better than explicit and that while there are many ways to do the same thing some are better than other in practice. Of course, "in practice" can change from one project to the other, what matters in consistency. I applaud the choice of Go to standardize on a single coding style for instance.

For the particular example of the parent, while I write a lot of C code for work and for fun I have yet to encounter a situation where I saw a function taking 10 floats as arguments and thinking "yup, that's completely the right way to do that". If you have an example of such a code I'd me more that willing to reconsider my position, otherwise we're just talking about the best way to tame a unicorn.

coherentpony · on April 25, 2014

structs are a good way to hide away all that state

rootbear · on April 25, 2014

Agreed, but there are still times when I end up having several arguments of the same type, and I see no virtue in having to repeat the type for each formal parameter.

raverbashing · on April 25, 2014

I don't see how having the type makes it easier to grep, grep finds the function name regardless

You can also search for something like "\w+\s(.?)\s*{"

coldtea · on April 25, 2014

>I don't see how having the type makes it easier to grep, grep finds the function name regardless

It's not about finding the function name in general, it's about finding where the function is actually declared -- not in the call sites.

>You can also search for something like "\w+\s(.?)\s{"*

Yeah, or use ctags. That's why they said "simpler".

"^foo" beats that.

raverbashing · on April 25, 2014

Yeah, ctags is a good choice

> "^foo" beats that

because it's so hard to use ctags...

clarry · on April 25, 2014

So how do you use ctags with grep?

coldtea · on April 25, 2014

It needs setup. I'd rather just grep.

4ad · on April 25, 2014

BSD/Plan 9 style is extremely useful because it allows grepping for functions. More specifically, it allows the creation of acme(1) addresses of the form /some/file.c:/^foo where foo is a function. In acme you can just B3-click on that string and it will open the file at the correct function...

It is also a pretty common style in the open source world.

raverbashing · on April 25, 2014

You can do that with ctags

> It is also a pretty common style in the open source world.

Depends, one proeminent C project doesn't use that: the linux kernel

copergi · on April 25, 2014

>You can do that with ctags

And I can do it without ctags. That's the point.

>Depends, one proeminent C project doesn't use that: the linux kernel

That doesn't make his statement "depends" at all. A single project not using that style does not contradict it being "pretty common".

foxhill · on April 25, 2014

this is a style i still use today (almost)

    int
    main
    (int argc, char **argv)
    {
      //body..
    }

although most people think i'm weird for it..

lutusp · on April 25, 2014

Yep, that's pretty weird. It harks back to the early days of C. When I first saw the currently recommended form:

    int main(int argc, char **argv) {
      //body..
    }

I thought it was nice but not obviously advantageous, but over the years I've come around to thinking it makes code easier to follow. Even many bash shell scripting guides now recommend a similar form:

    for x in y; do
      # body
    done

foxhill · on April 25, 2014

it certainly helps when your function isn't as simple as main..!

SixSigma · on April 25, 2014

it is so you can type

    grep -n ^functionname *.c

and get the line that starts the definition

rafekett · on April 25, 2014

which line?

copergi · on April 25, 2014

Are you sure you aren't confusing it with K&R style with the argument names on a different line? That would indicate old code. Having the return type on a different line is incredibly common, virtually every C project with decent code uses it.

coldtea · on April 25, 2014

>but for something written in the last couple of years to have over a thousand of them is very surprising

Hardly surprising. Gotos will be found by the hundrends or thousands in C projects like compilers, kernels etc.

It's used for localized consolidated error handling, but also for stuff like parsing code (bison generates tons of gotos IIRC), state machines, etc.

If anything it's the old "Goto statement considered harmful" that's a little too naive.

copergi · on April 25, 2014

>If anything it's the old "Goto statement considered harmful" that's a little too naive.

No, people who didn't read it and just repeat the title without knowing what he was talking about may be too naive, but the original point was not.

azernik · on April 25, 2014

Yeah, there are a lot of them in e.g. Linux kernel code for jumping to end-of-function cleanup. I suspect that in the process of translation to Go most will be replaced by the defer keyword.

im3w1l · on April 25, 2014

I saw one goto that could cleanly be replaced with an else (goto fp).

I don't think the authors were very goto-averse.

hrjet · on April 25, 2014

This might become a nice benchmark for the Go language; the same code base implemented in C and Go! It may not be fine-tuned for optimisation, neither in C nor in Go, but may still give a good ball-park estimate.

lazyjones · on April 27, 2014

Indeed, but it would be much more useful and impressive if they rewrote the compiler in Go by hand. I'm disappointed that they are aiming for an automatic translation instead - some people are going to ask themselves whether Go actually isn't that much fun to program in. An independent reimplementation is better for correctness too, they can compare outputs and find bugs in both implementations instead of porting over old bugs and adding new bugs where the translation goes wrong.

AYBABTME · on April 25, 2014

I'm really thankful for the liveblogging, as I couldn't manage to get my body to the conference.

I understand the desire to promote the Sourcegraph app by doing the blogging, and I think its effective. However, the blog is real annoying to browse, as every (prominent?) link points to Sourcegraph the app instead of the blog.

AYBABTME · on April 25, 2014

Just to clarify, because I think my original comment is disbalanced.

I'm REALLY thankful for sourcegraph's liveblogging. The above comment was a suggestion as I thought they might want to know that (at least for me), the navigation of the blog was confusing.

sqs · on April 25, 2014

Thanks! We are having a fun time liveblogging and are glad you're finding it useful. We got complaints when the blog image did NOT link to Sourcegraph, too. :) It's almost the end of this conference, but next time we liveblog, we'll have 2 separate header images, or try something else to make it less confusing.

kristianp · on April 25, 2014

"They’re deciding to automatically convert the Go compiler written in C to Go, because writing from scratch would be too much hassle."

When transcribing a talk, there isn't any need to write "They're". Just use the same pronoun the presenter used, otherwise it stands out like a sore thumb.

pohl · on April 26, 2014

3) Go has turned out to be a nice general purpose language and the compiler won’t be an outsize influence on the language design.

In what sort of ways does self-hosting early influence a language design? Were they hoping to avoid something in particular by delaying self-hosting?

gizmo686 · on April 26, 2014

The general way that self hosting influences language design is that the compiler is often one of the first major projects to be built using a language. This does not give it more influence then other major projects, but if your goal is to have a language designed for use case X, it is generally best to have your early projects with it be for X. Additionally, self hosting may encourage a language design that makes bootstrapping easier (such as a stricter divide between the state-1 language and the general language).

pohl · on April 26, 2014

I sort of had the general sense of that already. I guess I was hoping for something more specific about what language features are so useful when writing a compiler but get in the way for general problems.

The part I quoted above almost sounds like wiping sweat from one's brow after having dodged a bullet: "phew, the language is safe from influence by those gull-durn compiler-writers..."

steveklabnik · on April 26, 2014

To expand a bit on your last point, self-hosting helps answer the question "can I build Real Software (tm) with this language, or is it really difficult?" earlier in the lifetime of the language.

rdc12 · on April 26, 2014

"Note: There’s a book written about converting goto code to code without goto in general, but this is a sledgehammer and not necessary here."

Anyone have any idea what the title of that book is?

ANTSANTS · on April 25, 2014

  >A Union is like a struct, but you’re only supposed to use one value
  >(they all occupy the same space in memory). It’s up to the programmer to know which variable to use.
  >  There’s a joke in some of the original C code:
  >      #define struct union /* Great space saver */
  >  This inspired a solution:
  >      #define union struct /* keeps code correct, just wastes some space */

Somewhere in Scotland, a sum type sheds a single tear.

piokuc · on April 25, 2014

  >      #define union struct /* keeps code correct, just wastes some space */

Not always, though...

rsc · on April 25, 2014

Yes, always. And if you don't believe me, it's not my trick. I learned it from Dennis Ritchie (he was thinking about a C to Limbo converter).

andybalholm · on April 25, 2014

It works when unions are used correctly. If they are used to deliberately subvert the type system (e.g., put in an int, get an array of char out) it doesn't work. But C has so many other ways to subvert the type system that there's no need to do that.

Probably about the only place you'll see them used like that is in the code produced by web2c in compiling TeX. Knuth used variant records a lot to get around Pascal's type safety, and they get translated to unions.

rsc · on April 25, 2014

That code is not valid according to the C standard, so there is no guarantee it will work anywhere. In particular, many modern compilers have optimizations that would break that code. I would be a little surprised if modern web2c still uses unions this way and gets away with it.

The only standard compliant way to, say, convert a float to an int is to use memmove:

  uint32 i;
  float32 f;
  i = 0x80000000;
  memmove(&f, &i, 4);

andybalholm · on April 25, 2014

I finally managed to track down the definition of memoryword from texlive-20130530-source/texk/web2c/texmfmem.h. Here it is:

        typedef union
	{
	#ifdef TeX
	  glueratio gr;
	  twohalves hh;
	#else
	  twohalves hhfield;
	#endif
	#ifdef XeTeX
	  voidpointer ptr;
	#endif
	#ifdef WORDS_BIGENDIAN
	  integer cint;
	  fourquarters qqqq;
	#else /* not WORDS_BIGENDIAN */
	  struct
	  {
	#if defined (TeX) && !defined (SMALLTeX) || defined (MF) && !defined (SMALLMF) || defined (MP) && !defined (SMALLMP)
		halfword junk;
	#endif /* big {TeX,MF,MP} */
		integer CINT;
	  } u;

	  struct
	  {
	#ifndef XeTeX
	#if defined (TeX) && !defined (SMALLTeX) || defined (MF) && !defined (SMALLMF) || defined (MP) && !defined (SMALLMP)
		halfword junk;
	#endif /* big {TeX,MF,MP} */
	#endif
		fourquarters QQQQ;
	  } v;
	#endif /* not WORDS_BIGENDIAN */
	} memoryword;

Once you sort through all the ifdefs and typedefs, it boils down to a union of a float (glueratio), an int32 (integer), a pair of int16s (twohalves), and a struct of 4 bytes (fourquarters). When TeX creates its memory dump file, it does it all in terms of the bytes in the fourquarters struct. (And I think there are other times that it does similar things.)

This may very well be undefined behavior, but it seems to work when compiled with GCC anyway. Thankfully gc isn't such crazy code as this.

to3m · on April 25, 2014

C99 6.5.2.3.5 seems to suggest that non-overlapping union members could break conforming code:

``One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.''

bzbarsky · on April 26, 2014

Which part of the C standard would forbid type punning a uint32 to a float32? http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm suggests that C89 explicitly allowed this, C99 at first glance did not, and was errata'ed to make it clear that this is still allowed. C11 seems to have the same verbiage.

Someone · on April 26, 2014

Devil's advocate example:

  union hack { int x; float y};

  int foo( int *i, float *f)
  {
     *i = 3;
     *f = 42.0;
     return *i;
  }

Aliasing rules say foo need not read from that int pointer, may switch the order of the write to f and the write to i, and can assume that foo returns 3, so

  struct hack h;

  foo( &h.i, &h.f);

might return 3 or something else. I think that last call introduces undefined behavior, but only becuase of the definition of foo that the writer of that call might not even have the source for.

But of course, that is an "you shouldn't do that" edge case. One could also claim that the corrigendum doesn't apply because foo doesn't "use a member to access the contents of a union".

bzbarsky · on April 27, 2014

Sure, but that's an aliasing issue, not a type punning issue. I agree that the code you cite there is a violation of the aliasing rules, and will not work "as expected" on modern compilers unless one does the equivalent of gcc's -fno-strict-aliasing.

And I agree that the corrigendum doesn't apply in this case. Once you hand different-typed pointers to the same memory to people, whether via union or just casting pointers, the aliasing rules will up and bite you.

piokuc · on April 25, 2014

That's why I said "not always". Here is an example which will print different result if you define

  #define union struct

Here you go:

  #include <stdio.h>
  union U { char a; char b; };

  int main() {
    U u;
    u.a = 13;
    u.b = 10;
    printf("%d\n", int(u.a));
    return 0;
  }

to3m · on April 25, 2014

I don't think a and b need necessarily overlap. I think the types need to be more like this:

    struct A {char a;};
    struct B {char a;};
    union U {struct A a; struct B b;};

spc476 · on April 26, 2014

It would break my current usage of unions.

    typedef union
    {
      struct sockaddr     sa;
      struct sockaddr_in  sin;
      struct sockaddr_in6 sin6;
    } addr__u;

as a way to work around the BSD socket API without horrible casting all over the place. Change the "union" to a "struct" and the code no longer works. Is it subverting the type system? Yes. Is casting subverting the type system. Yes. Pick your poison.