Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Quick Comparison of Nim vs. Rust (arthurtw.github.io)
221 points by arthurtw on Jan 14, 2015 | hide | past | favorite | 90 comments


It's feels like a shame to me that the one Python maxim that Nim has chosen to reject is "there should be one and preferably only one way to do it" - from this flows so much of the other goodness of the Python ecosystem. I worry that some of the metaprogramming magic that Nim allows will result in the loss of that feeling that I can view source on almost any 3rd party Python code and immediately feel like I can follow what's going on. Maybe community standards can help restrain this somewhat.

Also:

> mapWidth, mapwidth and map_width all map to the same name

Why or why? Editors and IDEs now need Nim-aware search code!?


> "there should be one and preferably only one way to do it"

In my very humble opinion this idiom is amazing at the start of a language and then it handcuffs it and that is why we have Python 2 or Python 3 issues.

I use to use Python for statistical work and about 2 years ago I switched to R. R is VERY flexible and seen a huge change over the last 5 or so years. It has changed for me especially in the last 6 months with changes in what libraries I use. Using dplyr and other libraries my code is night and day different. We now have piping with %>% that changes me code completely and makes it MUCH more readable and quick to write code. I don't see anything like this happening quickly in Python.


When you write code that's intended to be maintained for years, a degree of resistance to shifting language idioms should be considered a feature. That's not to say that Python itself is the greatest language for long-term maintainability, but I personally have both years-old Python code and years-old R code, and any thoughts of doing maintenance on the latter fills me with trepidation.

But when it comes to short-term, one-off, or unimportant tasks, I agree that an emphasis on flexibility is welcome. There's a reason that TIMTOWTDI works so well for Perl in its original role as a shell scripting language.


On the other hand, it's quite convenient if you can't remember the exact name and it saves you a look-up e.g. quick_sort vs quicksort vs quickSort - either work


That means that all three forms will wind up in any Nim project that gets large enough, though. Hardly a good thing.


If they all work, does it even make a difference? Besides it would be trivial to write a refactoring tool that can autoreplace all instances with the preferred form.


It has little benefit and requires new tooling to do simple refactoring like "rename".

> Besides it would be trivial to write a refactoring tool that can autoreplace all instances with the preferred form.

This should be part of the language if it's being so lax with identifier uniqueness. And if it's so trivial, you should write it so people can't complain anymore.


I don't get why people get so worked up over this. It's not a "little" benefit in my opinion. In regards to drawbacks I can only see one, and that is grepping for the identifiers becomes more difficult.


> It's not a "little" benefit in my opinion

Could you explain it, then? An identifier refers to a single thing. I don't see having multiple ways to refer to that identifier as a win AT ALL—it may be a win for people who are too lazy to learn their own code base, but it makes code hard to read, hard to maintain, and hard to refactor.

Meanwhile, having an identifier unique makes it easy to index, easy to manipulate.


I don't understand why you think it makes code hard to read. In what situation would a code base use fooBar and foo_bar as two different identifiers meaning two completely different things? The idea behind this "style insensitivity" is that amyAtePizza has the same meaning as amy_ate_pizza. Why should it be distinguished in a programming language?


> In what situation would a code base use fooBar and foo_bar as two different identifiers meaning two completely different things?

You don't. It's a terrible idea to mix naming conventions.

> amyAtePizza has the same meaning as amy_ate_pizza.

WHY?! What possible benefit could it have? Why would you be mixing styles in the first case? Why can't you just remember which style hopefully your entire code base uses?


I wouldn't mix styles inside my own code base. But what if I am using somebody else's library which uses a different style? The benefit is that I can then use the style I have been using in my code base to call the functions in that library without mixing naming conventions.


I hope you never move code between different code bases, then.

Wouldn't it be a lot simpler to have a single style?


I think it will put more cognitive load on the humans who read the code. Instead of doing a quick visual comparison to see if identifiers are the same, programmers will need to actually comprehend the two tokens, and then transform them in their head to see if they are the same. This sounds minor, but I think the cognitive load can add up. Visually comparing two tokens is not quite conscious thought, while comprehension is. It's something you'll always need to think about when reading code.


You don't understand how having multiple highly-different forms for each name is harder to read? People read a token at a time whenever possible.

Nobody is arguing that you should be able to use multiple forms for different variables. They're arguing that you shouldn't use multiple forms at all.


When I present myself to people I use one form of my name. Some people I know use nicks. Others use part of my formal name. Due to me being a foreigner, essentially everybody I know uses a slightly different way to name me or pronounce the same name. However, there is consistency. Each person will use the same form for naming me.

This happens with Nim code too. Each programmer will have their convention, and will use it consistently. If you have a team, you establish a convention for the project: you need that anyway because people creating new stuff need a consistent way of naming things and you don't want to later rename everything after a heated discussion. So if you can't keep people on your team from using a project convention, you have a bigger problem than slightly different identifiers in a programming language.

Also, once you start using the different free naming conventions in Nim you grow inmune to them looking ugly. Just like when you know only one programming language everything else looks alien, but when you learn something else you expand your horizon and it stops looking so alien, and it stops being "harder to read". I jump from one convention to another without problem for each project as needed, and doesn't even bother me at all.

tl;dr it's not a big deal, don't let that stop you from learning a very nice language, you could regret it later


Well, it's harder to grep, for one (although far from impossible).


It's an unusual feature. Perhaps it calls for a 'go fmt' (nim fmt?) type tool that canonicalises names, to be automatically run on check-in.


I do not get it - what kind of "win" is it? For example, I know that PHP functions are case-insensitive, but we got to the point where we run linters to check if we used the same case as in declaration. Why not simply bake it in the compiler, especially when you HAVE a compiler?


Perhaps it's a reaction to the state of affairs in C++.

In Java and C# there are pretty widely adhered to coding standards. It's quite hard to find libraries that don't have consistent public APIs with these standards, and those that differ are usually subject to rumblings on their mailing lists and issue trackers.

In C++ there is no agreed upon standard. The standard library uses lower_train_case for all (public) types/members, and much 3rd party library code uses PascalCase and camelCase. The C++ standard library has fewer features than the Java/C#/Python standard libraries, and so you end up bringing more 3rd party code into non-trivial projects (in my experience).

If this feature existed in C++, a codebase could be much more internally consistent. However I'm still not sure I'd like this feature to be available. Certainly it's not possible to add it after-the-fact due to potential conflicts between members that differed by case.


I have a vague theory that the design of most programming languages can be understood as a reaction to the pain their designers experienced in their previous language. So, Java is largely C++ without the things that were painful in C++ (manual memory management, multiple inheritance, operator overloading). Nim's unusual handling of compound names is a reaction to something else that was painful in C++.

Hopefully they'll add a Smalltalk/Objective C style format one day.


Well thank god a minor inconvenience has been addressed!


It allows you to use your own style everywhere, instead of relying on the specific styles of the library you used (so you don't end up in your code with camelCase AND snake_case).

It's definitely unusual, but I think it works quite well.


This is largely a matter of taste, but I think go has it exactly right. Ruthlessly enforce only one way, and make that as mandatory as possible.

Having code which is formatted as uniformly as possible is a huge boon. Rules are better than norms, especially if there is zero cost of enforcing them.


This makes a lot of sense, and makes me appreciate the choice more (although my gut feeling was that it was a good idea).


`nimgrep` is a tool that comes with it to help with that. It's actually pretty useful, especially when dealing with wrappers over C libraries (I'm looking at you, SDL2...)


> It’s mysterious that Rust’s release version with -i ran slightly faster, though.

That's actually not surprising: a large fraction of time is spent in the map lookup, which in Rust is implemented as a B-tree, thus lookup time is (mildly) dependent on the map size. If keys are lowercased before inserting them, the map ends up having fewer elements.

The Nim version uses instead hash tables, whose lookup time is near-constant (that is, excluding memory hierarchy effects).


> That's actually not surprising: a large fraction of time is spent in the map lookup

This was not the case in my benchmarks: the Rust spent 60% of its time doing regex matches, 25% allocating strings and less than 6% manipulating the map.


This seems an odd choice to me (default to B-Tree instead of hash map). I'd expect a sorted map to be a special case, not a general one. Do you happen to know the rationale?

Not criticizing, just curious.


It's a poor choice on the author's part

http://doc.rust-lang.org/std/collections/

    Use a HashMap when:
    
    - You want to associate arbitrary keys with an arbitrary value.
    - You want a cache.
    - You want a map, with no extra functionality.
    
    Use a BTreeMap when:
    
    - You're interested in what the smallest or largest key-value pair is.
    - You want to find the largest or smallest key that is smaller or larger
      than something
    - You want to be able to get all of the entries in order on-demand.
    - You want a sorted map.


I chose it because I wanted to learn Rust by implementing my own BTreeMap struct. I admit it’s not a good choice performance-wise, and made the comparison with Nim less meaningful.

I’ve updated the code and article with HashMap. It runs about 6~7% faster than BTreeMap.


Slightly confusingly, the authors BTreeMap just appears to be a binary tree, while the standard library BTreeMap is a B-tree: http://en.wikipedia.org/wiki/B-tree


That should explain it. I’ve removed the line as it’s no more mysterious...


You're still comparing two different data structures. Is there a good hash table in Rust? If you use that instead of the B-tree, I would expect it to be at least as fast as Nim.

B-trees are especially bad for string keys, because comparisons are expensive.

EDIT: From Rust docs: "Currently, our implementation simply performs naive linear search. This provides excellent performance on small nodes of elements which are cheap to compare". (emphasis mine)


It turns out `collections::HashMap` runs about 6~7% faster than BTreeMap. I will update the article to include results of both data structures.


Also seems like you could benefit from the entry() api (available on both BTreeMap and HashMap):

http://doc.rust-lang.org/std/collections/struct.BTreeMap.htm...

I think the example used in the docs is your exact use case.

This bit:

  let found = match map.get_mut(..) {
    ..
  }
  if !found {
    ..
  }
can be replaced with

  match map.entry(word) {
      Occupied(mut view) => { *view.get_mut() += 1; }
      Vacant(view) => { view.insert(1); }
  }


Good point! I’ve updated the article accordingly. I learned the Entry thing a few months ago, but Rust’s BTreeMap did not support the entry API at that time, so my code did not use it. Then I totally forgot about it when writing this blog...


I like how articles like this help elevate Nim's status. To me, deep inside, Rust always felt like this grand high-stakes project, by wise people at this big experienced company Mozilla, and Nim felt like a hobby project that got out of hand. That's an entirely unfair judgment of course, but I bet more people feel that way. I like that Nim is starting to get the attention it deserves.


Yes, I think Nim deserves more attention, and that’s part of the reason I wrote this article.


Never even gave Nim a thought. Just read some and now I think I will try a little pet project with it. Really looks very interesting. I am wondering how the C, C++ , Objective C and JavaScript from one compiler works. personally I never want to touch JavaScript but it might be interesting how this works.


The first benchmark is primarily a comparison of Nim's PEG package to Rust's libregex package. The two have very different algorithms, and libregex is optimized to avoid exponential blowup on pathological regexes. It's missing a fallback to the backtracking algorithm at present.

Using rust-pcre would probably mitigate this problem.


I was still pretty surprised that `regex` was getting killed. It turns out, I think, that `\w+` in Rust is Unicode friendly, but it's not in Nim. In cases where most matches fail, checking the full spectrum of Unicode "word" characters becomes pretty expensive (although it is at least doing a binary search on contiguous ranges of characters: https://github.com/rust-lang/regex/blob/master/src/vm.rs#L23...). When I switched `\w+` to `[a-zA-Z0-9_]+` in the Rust program, I saw a ~60% performance increase.

> and libregex is optimized to avoid exponential blowup on pathological regexes. It's missing a fallback to the backtracking algorithm at present.

Maybe. RE2/C++ doesn't do any backtracking AFAIK, but it appears near the top of any benchmark I think.


Bingo. With `regex!(r"[a-zA-Z0-9_]+")`, Rust finally runs faster than Nim. I’ve updated the article.


I'd guess that pulling that try-catch out of the loop would make things go much faster. Nim doesn't use 0-overhead exceptions, so setjmp needs to be called each time the try-catch is entered.

You should also use the re module, PEGs is not nearly as optimized as PCRE.


I’ll try that later, though the Nim version is fast enough after using -d:release flag.


I like Nim. I used it for a few side-projects as well. The only thing it definitely needs before feeling really solid is trait (interface, contracts or whatever you call it) support IMHO. Current alternative is "compiler does copy-paste for you", aka templates. See: https://github.com/Araq/Nim/blob/master/lib/pure/collections...


From what I have read, Nim has a much better cross-compilation story (i.e. it delegates to the available C cross-compilation toolchain instead of requiring the Nim compiler and toolchain to be compiled for the cross target).

Rust, as I understand, requires the Rust compiler to be compiled for the cross-target; is this correct?


The compiler does not need to be compiled for the cross-target to just run binaries, just the standard library like C; but, once they exist, running the compiler directly with `rustc --target=...` and using the dependency/build manager cargo `cargo build --target=...` both work fine.

Of course, it is harder to obtain cross-compiled versions of the Rust standard library at the moment, so what you say is likely true.


llvm has pretty good cross compiler support, So I don't know.


Are you just using --opt:speed to compile the Nim examples? For maximum performance you should be using -d:release.


-d:release disables bounds checking, so the Rust examples would have to be modified for unchecked indexing for a fair comparison.


It disables a lot of other things too. The current comparison is unfair. Why do the examples need to be modified to disable bounds checking for Rust? If it's easier you can compile the Nim examples with -d:release and enable bounds checking by also supplying --boundChecks:on.


I’ve added additional results with Nim’s --boundChecks:on flag. Thank you two for the suggestions.


Oh, my bad. The numbers differ a lot (2~2.5x faster). I’ve updated the article.


Now it's more in line with my experience of Nim usually being quite a bit faster than Rust, not to mention much easier to write and less verbose (text munging code in Rust seemed to be about 15% calls to as_slice() and to_string ()).


The as_slice situation was acknowledged to be deplorable, and is being vastly improved in the runup to 1.0.

As for speed, please file a performance bug! We love to know where we're missing out on optimizations.

https://github.com/rust-lang/rust/issues


I'm starting to really wish there was a GC-free (or at least) GC optional version of Nim. Nim with Rust's memory semantics would just be the best of everything.


Well, that's kinda the point. If there was a better way to have to achieve GC-free semantics, Rust would have taken it.

As for optional GC, didn't D haven't something along those lines?


Like Nim you can disable the GC in D.

AFAIK, most libraries won't handle that situation gracefully as they don't expect it.


Optional gc is tricky, as you would need all libraries written this way


This is one of the reasons why the Objective-C GC failed.

It just didn't work out mixing Frameworks compiled in different modes, many times leading to crashes.


I believe you can turn off the GC for Nim. Then you're just limited to libraries that don't rely on the GC, and fully manual memory management.


This is more of an option than you might think. Nim plugs in to C libraries much easier than you would expect from an FFI. That means even if the stdlib depends on the GC heavily, you can eschew it for the C stdlib.

For example:

    proc printf(formatstr: cstring) {.header: "<stdio.h>", importc: "printf", varargs.}
I'd be interested in seeing a comparison of higher level language features rather than performance (which is generally a result of data structure choices, etc). Stuff like package managers, concurrency primitives, cross compiler support, memory management, and general syntax.


Wow, that's a pretty cool insight. Basically that means you could use "Nim without GC" as "a better C". I can see very few downsides. You could probably start using it in existing C projects much like you could start adding .scala files to an existing Java project.


As an example, that's exactly what I'm doing with my SDL game experiments; I'm using Nim as a "better C", effectively. It works rather brilliantly.


That isn't Rust's semantics though (as specified in the parent post): Rust's semantics are about ensuring safe manual memory management.


Why don't you run GC_fullCollect() at the end of Nim programs? You may unknowingly be comparing a program which actively runs destructors and frees memory with one that basically leaks everything (because GC does not happen to be triggered).


One thing they both lack that hinders wider adoption : good IDE support. For me YouCompleteMe support is the only thing preventing me from switching for my side projects.


How can you base the choice of a language solely on that?


I wouldn't chose a language based solely on that, but when faced with a few equally good choice, then I will absolutely go with the one that has a, to me, better development environment.



Nim needs to move it's discussions to a mailing list if the authors want to gain more serious developers onboard.

Polling a poorly implemented web forum speaks leaps and bounds about the kind of attitude you need to have to discuss, develop or debug issues around the Nim toolchain.

This is something that Nim developers can instantly do to boost the attractiveness of the language.

Please, do this.


The web forum is better than 90% of the forums out there. It is very fast, looks great, has syntax highlighting and a fast search. How could you possibly think that a mailing list is better? What is this, 1981? I hate mailing lists. The github issues is where a lot of serious discussion goes on and if you give github your email and contribute or subscribe you can get spammed with every single issue discussion like me. Anyway I think the highest bandwidth most advanced option is sometimes IRC which of course Nim has too.


> about the kind of attitude you need to have to discuss, develop or debug issues around the Nim toolchain

Well, it's the kind of attitude that let me contribute to the language, and fix bugs despite being rather new to it all, so I'll be honest and say that I think it's quite good. Aside from that, the IRC channel is always super busy and amazingly helpful. I think a mailing-list is a good thing, sure, but it's not the be-all-end-all, in my humble opinion.


Can you point me to these serious developers who would be using Nim or contributing to Nim if a Nim mailing list existed? If I have proof that these developers exist then I may actually implement it.


Here's a pure code comparison of Nim vs Rust: http://rosetta.alhur.es/compare/nimrod/rust/


That's partially outdated for Rust, but I guess Rust is too much of a moving target.

For example `from_str::<int>` should be `.parse()` today.


The code samples are from the RosettaCode[1]. They are probably lacking a frequent contributor for making and fixing Rust code examples.

[1]: http://rosettacode.org/wiki/Rosetta_Code


There's a repository where Rust versions are being worked on: https://github.com/Hoverbear/rust-rosetta


Can you benchmark this C++ code on your system for comparison? https://github.com/rjammala/C/blob/master/perf.cpp

I did not add the case sensitiveness option to it.


I really wouldn't consider it reasonable comparison, when one language has a sufficient compiler that is written in itself for most part, and the other still only compiles to C.


Why should such comparison be unreasonable? The fact that a compiler internally produces C/Assembly/Brainfuck/whatever code to produce the final binary should be relevant only for computer science theorists, not for the majority of the users out there (like me).

Also, I find one of your statements quite contradictory: by saying that Rust's compiler "is written in itself", you seem to imply that Nim is not. But this is not true: Nim's compiler is almost 100% pure Nim.

(Last but not least: to me it seems not true that Nim "only compiles to C", as it provides multiple backends. See here: http://nim-lang.org/backends.html .)


Perhaps I have had a wrong impression since I read on Nim quite a while ago last time and should checkout this article now. Thatnks for pointing this out!


rust uses LLVM (a C++ library) as a backend; nim uses clang (a compiled C++ executable) as a backend. While there are differences, neither is clearly superior.

Generally, the LLVM route will be faster/more compact when compared to Clang. However, nim can just as easily use gcc, or MSVC, or tcc, which rust cannot. Especially with respect to tcc, this means a stand-alone all-inclusive nim compiler executable can quite easily be constructed and will probably be ~1MB, whereas the comparable standalone rust compiler rustc executable is likely to clock in at 20MB. And you know what? That makes no difference in today's world.

Edit: made it explicit that I am talking about the compiler itself (which was the subject of the parent post I was replying to).

I estimated that the rustc will be 20MB based on my experience with linking LLVM statically in the past - I have not tried it recently, nor have I built rustc. I might be way off. Kibwen says his rust compiler clocks at 8MB - so it appears I am way off (or maybe comparing an unstripped executable to a stripped one)

To be clear to child comment: I did NOT intend to claim that writing something in C will provide a 95% reduction of code size.


What would make a standalone Rust executable 20 MB in size? Last time I checked, Rust did not expect to carry libc with it, and can run entirely without a runtime.


For purposes of comparison, the Rust compiler (which is the most extensive Rust program on my computer at the moment) clocks in at 8MB.

EDIT: Servo, which is the largest Rust program in existence at the moment, is 21MB. Personally I doubt the above claim that writing this in C would reduce the binary size by 95%. I obviously don't have any statically-linked written-in-C web browsers on hand, but if I poke around in Chrome's install directory I see 85MB of DLLs (the entire directory is about 475MB, though surely not all of that is due to binaries).


I am Updated my comment (GP) to indicate I was talking entirely about compiler size, not executable product size.

Is the 8MB rust compiler statically or dynamically linked to LLVM? It's been a few years, but I never got a statically linked LLVM-using program below 30MB (unstripped, I think stripped was 16MB but I don't remember)


Go's compiler was entirely C until just recently... What does it matter? It's hard to write a compiler in a language that doesn't exist yet, and one you have the language, why bother with a rewrite?


Rust compiles to LLVM, which is a big chunk of C++ code.


It'd be good to see how these new languages compare to established ones that offer similar features - maybe OCaml and Haskell.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: