Cosmopolitan Libc 1.0

iainmerrick · on May 17, 2021

I agree with some of the other comments here, that while this is a very impressive hack (and this is Hacker News!) the real world value seems dubious.

It looks like a very clever way of packaging an x86 fat binary for multiple platforms, without actually duplicating the code. To support ARM I assume it’ll need to be an actual fat binary, with both x86 and ARM code. At that point, unless I actually test the code myself on both architectures, how can I be confident it’s going to work properly?

If you’re using very simple C constructs and not doing anything fancy, it should work, but it’s not clear to me that this approach is preferable to e.g. a Python script. If you’re doing fancy stuff, it’s a bit more chancy as C isn’t memory-safe and has tons of undefined behavior and platform-specified weirdness.

Java is “run anywhere” because they’ve specified the JVM in massive detail and tried to ensure it actually works the same on all platforms. I don’t see how you can have that same confidence if you’re running machine code everywhere.

I guess I just don’t see the use case where this is compelling. If I write a handy Unix utility in C, I’ll just keep the source code around, and compile it as needed.

It might be handy if you need to move such a utility quickly from Unix to Windows, if you don’t have any dev tools set up. But I can’t think of a situation when I’ve needed that.

tstack · on May 17, 2021

I’m planning to use it in my terminal-based log file viewer (https://lnav.org) to deploy an agent (over ssh) to remote machines to monitor their log files. CosmoC allows me to build a single agent binary that will work on lots of target machines and package that into the main lnav binary. I’ll still be building lnav packages for different platforms, but the agent build and deployment is vastly simpler with CosmoC.

I’ll be using this feature in my day-to-day work because I interact with short-lived VMs running the software I’m developing. Being able to run lnav on my devbox and have it live-tailing logs on the VMs running the changes will be an improvement over the current workflow where I manually copy the files onto the devbox.

sime2009 · on May 17, 2021

I'm looking at Cosmo with similar applications in mind; just-in-time delivery and execution of a small binary in a "remote" environment like a server, container, VM, or maybe embedded system.

brabel · on May 17, 2021

Hey! Lots of people at work love your lnav tool! Thanks for making it!

Can't wait to try this feature.

1vuio0pswjnm7 · on May 17, 2021

"I guess I just don't see the use case where this is compelling. If I write a handy Unix utility in C, I'll just keep the source code around, and compile it as needed."

One use case is where the source and target are different operating systems, and either the target does not have a compiler installed or the source does not have a cross-compiler installed.

I run into this often. I do not wish to install Python or Java on every computer. I have small form factor computers where I need to conserve space. Also, I prefer the size and speed of C over Java and Python.

I think there may be a tad too much "marketing" and trying to be cute in the way this is presented but this work certainly has potential utility, whether or not it is used as intended. Even just reading through the hacks is educational.

Programming does not always have to be commercially-oriented to be useful or interesting.

iainmerrick · on May 17, 2021

Programming does not always have to be commercially-oriented to be useful or interesting.

Agreed!

Thanks for the response, that does sound like an area where this could be useful.

Although (speaking as somebody who doesn’t do a lot of embedded programming these days, so take this with a pinch of salt) I would have thought that if cross-compilation could be made much faster and easier, that might be a better solution overall. I really like what’s being done with Zig around cross-compilation.

smasher164 · on May 17, 2021

> One use case is where the source and target are different architectures, and either the target does not have a compiler installed or the source does not have a cross-compiler installed.

Do you mean different OSes? AFAIK, Cosmopolitan isn't cross-architecture.

coolreader18 · on May 17, 2021

IIRC it involves qemu on aarch64

smasher164 · on May 17, 2021

Gotcha. That still runs counter to the comment:

> I do not wish to install Python or Java on every computer. I have small form factor computers where I need to conserve space.

Now you have to install QEMU on every computer, in which case your options for portability dramatically increase.

jart · on May 17, 2021

You can always build headless blinkenlights https://justine.lol/blinkenlights/index.html (part of the cosmo codebase) for aarch64 or whatever and use that to simulate ape binaries. If you make -j12 MODE=tiny o/tiny/tool/build/tinyemu.com it's 191kb and that simulates the whole x86_64 ring 3 architecture up to ssse3, plus enough of ring 0 currently for the cosmo codebase to have metal unit tests. The APE blog post talks about the possibility of embedding something like that inside these x86 binaries. So if you feel really strongly about non-x86 as many of the people in this thread do, then 90% of the work has been done for you. I haven't taken it 100% of the way there since I personally don't need non-x86 support.

tgamblin · on May 17, 2021

> I guess I just don’t see the use case where this is compelling. If I write a handy Unix utility in C, I’ll just keep the source code around, and compile it as needed.

Isn't this exactly why you don't see the use case? You're willing to compile.

As someone working on a cross-platform, cross-language packaging tool (https://github.com/spack/spack), it's very appealing to not have to build for every OS and Linux distro. Currently we build binaries per-OS/per-distro. This would eliminate a couple dimensions from our combinatorial builds.

We still care a lot about non-x86_64 architectures, so that's still an issue, but the work here is great for distributors of binaries. It seems like it has the potential to replace more cumbersome techniques like manylinux (https://github.com/pypa/manylinux).

iainmerrick · on May 17, 2021

Currently we build binaries per-OS/per-distro.

What varies between Linux distros, for your purposes? Different libc? I would naively assume it’s all just ELF so it shouldn’t be that big a deal to make a portable Linux binary.

That seems separate from swizzling ELF/Mach-O/PE, specializing the binary on first run(!), etc, which is all super cool but something I’d be wary of relying on as a solid platform. Maybe I’m being too cautious, though!

Reading the Cosmopolitan docs, it has some really clever optimizations, and I think I’d be more excited about simply a small and fast libc over the flashier APE parts.

tgamblin · on May 17, 2021

See https://www.python.org/dev/peps/pep-0513/#key-causes-of-inte....

Software built with a newer glibc is not compatible with software built with an older glibc. You can't just build an EFL binary on an arbitrary distro.

Moreover, and probably more notably for cosmopolitan, binaries built with glibc aren't compatible with BSD libc or musl. IIUC, cosmopolitan works across the BSDs and musl-based Linux distros, as well.

jart · on May 17, 2021

Cosmopolitan is RHEL5 compatible so you can have the manylinux1 holy grail. I love you for posting here. You get it. It's all about reducing the dimensionality of your support vector. No one wants combinatorial explosions of toolchain triplets leading to broken builds and broken hearts.

smasher164 · on May 18, 2021

Would you not get the same effect by statically linking to MUSL? I could see some problems with NSS, but there are workarounds there as well with nsss.

BiteCode_dev · on May 17, 2021

It can be useful to provide a single installer that works on any system, downloading the machine specific rest of the program.

varajelle · on May 17, 2021

An installer without UI or network access?

GranPC · on May 17, 2021

It looks like the runtime has networking support. https://github.com/jart/cosmopolitan/tree/master/libc/sock

dingdingdang · on May 17, 2021

It looks like the runtime has embedded local-gui-view-via-system-browser-server support. https://justine.lol/redbean/index.html - more useful info: http://redbean.justine.lol/redbean.lua?x=hi%20there%00%C0%80...

jart · on May 17, 2021

There's also WIN32 GUI support too. See this demo program called apelife which was built with Cosmopolitan Libc. https://justine.lol/apelife/index.html It embeds a UNIX TUI that runs on both Windows and UNIX. It also embeds a WIN32 GUI too, in the same binary. There's also an example of how to build TUI panel applications in the examples/ folder. https://github.com/jart/cosmopolitan/blob/master/examples/pa... The canonical example of a Cosmopolitan TUI would be https://justine.lol/blinkenlights/index.html I highly recommend writing TUIs because VT100+XTERM is universal these days and termios was easy to polyfill across operating systems. These things literally run in the CMD.EXE DOS command box on Windows 10 without needing any special #ifdef or conditions, which is wild, and it even works on metal via the serial uart port.

    make -j8 o//tool/viz/deathstar.com
    qemu-system-x86_64 -m 16 -nographic -fda o//tool/viz/deathstar.com

You don't have to do anything special either. You just write your TUI program in the conventional UNIX style.

twic · on May 17, 2021

IIRC on non-x86 platforms, it fires up QEMU!

motiejus · on May 17, 2021

I am a bit worried about this:

> Please note that your APE binary will assimilate itself as a conventional resident of your platform after the first run, so it can be fast and efficient for subsequent executions.

I understand there may be no real way out, but this defeats part of the promise/purpose of PAE: assume I use a pae binary, know it's pae, and implicitly share it or copy it to a different machine. But once I have copied it from ~/bin/ it's not pae any more, it's optimized for my OS/arch!

Assuming the first-run optimization step is necessary, would it make sense to provide `--deoptimize` or `--paeize` flag to the same binary, so it's easy to return to the original, without recompiling it from source? Does it lose information when it optimizes? Can that information be tucked away (with an optional flag or env variable) for this step?

What happens if the binary is read-only?

Gaelan · on May 17, 2021

Iirc, APE starts with a (Windows) PE header, which is also a valid shebang-less shell script; when invoked on Unix, the shell script replaces the beginning of the file with an ELF/Mach-O header and invokes itself.

emidoots · on May 17, 2021

Super exciting to see Cosmopolitan progress further! For those wondering what it is, from the website[0]:

> Cosmopolitan makes C a build-once run-anywhere language, similar to Java, except it doesn't require interpreters or virtual machines be installed beforehand. Cosmo provides the same portability benefits as high-level languages like Go and Rust, but it doesn't invent a new language and you won't need to configure a CI system to build separate binaries for each operating system. What Cosmopolitan focuses on is fixing C by decoupling it from platforms, so it can be pleasant to use for writing small unix programs that are easily distributed to a much broader audience.

[0] https://justine.lol/cosmopolitan/index.html

KMag · on May 17, 2021

> Cosmopolitan Libc makes C a build-once run-anywhere language, like Java, except it doesn't need an interpreter or virtual machine.

... and it only supports x86 (without binary translation), right? It's great to see progress like this, but it's poor form to suggest it's build-once run-anywhere in the same sense that Java is. As far as I can tell, it's not trivial to run these binaries on a RPi.

girvo · on May 17, 2021

https://justine.lol/ape.html

There is actually a discussion on that on this page, I'll embed part of the relevant discussion here:

> It'll be nice to know that any normal PC program we write will "just work" on Raspberry Pi and Apple ARM. All we have to do embed an ARM build of the emulator above within our x86 executables, and have them morph and re-exec appropriately, similar to how Cosmopolitan is already doing doing with qemu-x86_64, except that this wouldn't need to be installed beforehand. The tradeoff is that, if we do this, binaries will only be 10x smaller than Go's Hello World, instead of 100x smaller. The other tradeoff is the GCC Runtime Exception forbids code morphing, but I already took care of that for you, by rewriting the GNU runtimes.

Not exactly what you meant, perhaps, but in the same ball-park

KMag · on May 17, 2021

This is great for x86, but for low-overhead multi-platform binaries, IBM came up with a pretty good solution decades ago with OS/400 TIMI (Technology Independent Machine Interface), which is basically Android Runtime's install-time native code generation, but suitable as a C/RPG/COBOL/CL compilation target and not mandating a garbage collector.

Given Apple has undergone 3 CPU architecture migrations by now and employs Chris Latner, I was hoping they'd move to something vaguely like a modernized version of TIMI (maybe based on LLVM bitcode) as the default XCode target for Apple Silicon. Rosetta has been good enough so far, but I can see a future where Apple starts really specializing cores (say ultra-low power cores for watches and glasses or an extreme form of big.LITTLE) to the point where it makes sense for them to have radically new instruction encodings.

skissane · on May 17, 2021

> This is great for x86, but for low-overhead multi-platform binaries, IBM came up with a pretty good solution decades ago with OS/400 TIMI (Technology Independent Machine Interface), which is basically Android Runtime's install-time native code generation, but suitable as a C/RPG/COBOL/CL compilation target and not mandating a garbage collector.

To be honest, I think the dream of portability which TIMI represented is mostly dead in recent IBM i versions. More and more functionality depends on the AIX compatibility environment, PASE, which doesn't run under TIMI, it is full of standard AIX XCOFF binaries containing POWER machine code. (Interspersed with calls to IBM i-specific APIs which allow PASE binaries to access services provided by code running inside and underneath TIMI.) Given the increasing use of PASE as time goes by, porting IBM i environments to something other than POWER (if IBM were ever inclined) has become closer to being as hard as porting AIX – which is to say, as hard as any other operating system. TIMI has evolved from a genuine source of portability (which greatly aided IBM in the CISC-to-RISC transition) into being little more than a historical vestige and form of backward-compatibility.

why_only_15 · on May 17, 2021

Apple hasn't employed Chris Lattner since January 2017: http://nondot.org/~sabre/Resume.html

snovv_crash · on May 17, 2021

iOS already supports bitcode deployments to the app store, I believe this is the direction they are taking to solve this problem.

KMag · on May 17, 2021

Very nice. I hope that in the future, native code will be the distribution format about as often as we write inline assembly. Sometimes you'll really need precise control over instructions, but for 99% of code, it's not worth the cost in giving up long-term technology improvements.

In particular, x86's total store ordering memory model causes some memory fences to disappear at the machine code level. The Aarch64 relaxed memory model allows for lower cache synchronization overhead, but code with correct memory fences compiled to x86 loses this information, requiring overly conservative binary translation/higher overhead TSO mode in Aarch64 binary translators. These days, hardware acquire/release/full flavors of memory fences better match the C++ and Java memory models, but some hardware has load/store/full flavors of memory fences. Binary translation across these flavors means changing all fences to full fences, or else some static analysis that's far beyond anything I'm aware existing at this time.

saagarjha · on May 18, 2021

Or just specialized hardware to do the TSO…

KMag · on May 21, 2021

One could, but then one would give up the advantages of a relaxed memory model.

masklinn · on May 17, 2021

Afaik bitcode remains architecture-dependent, so bitcode deployments is mostly do they can re-optimise, and a convenient component of app slicing (if they end up generating the final bundle anyway, having more visibility into what’s what is useful).

qznc · on May 17, 2021

With C/C++ the preprocessor makes architecture-dependent decisions. Bitcode has no chance to fix that.

Spivak · on May 17, 2021

So this is true but also not that relevant, bitcode is your architecture.

KMag · on May 17, 2021

I think re-targeting bitcode for a different architecture with the same pointer size and endinness is a pretty light weight operation, but I might be wrong. I'm pretty sure it's orders of magnitude lower overhead and less complicated than dynamically recompiling native code.

masklinn · on May 17, 2021

Oh for sure.

spijdar · on May 17, 2021

I've gotta second this. Maybe it's just being a grouchy pedant, but this "just" produces a polyglot x86_64 executable with a polyglot program header that rewrites its payload into the appropriate executable format for one of 4/5 supported OSes.

It's extremely impressive, and dare I say useful for several cases, but it isn't really like java unless you squint really hard and pretend virtualization/emulators like QEMU are akin to a JVM, which is the way I understand that claim is supposed to be taken.

I mean, the analogy works, and I respect why the author believes this is more practically useful than Java, but it's like saying (IMO) that Linux binaries are a universal standard because you can just virtualize or emulate a Linux kernel on the cheap, a la Docker for the Desktop or whatever it's called.

tomcam · on May 17, 2021

I respect your point. However, Java will always require that the user understand how to install a separate runtime.

barbarbar · on May 17, 2021

As well as keeping that JVM up to date.

sigjuice · on May 17, 2021

Also, how many of all the people complaining really actually ran the same Java program on multiple architectures in production?

sigjuice · on May 17, 2021

The 2-line intro says the format will run on a impressively long list of systems. Is it really that big of a deal that it doesn't specifically mention that it is x86-only? Please.

roca · on May 17, 2021

The problem is, the landing page says it's "similar to Java" right in the first sentence and in the second sentence "same portability benefits as high-level languages like Go and Rust", and later on "excellent performance on modern desktops and servers" and "compared with glibc, you should expect Cosmopolitan to be almost as fast".

The tricky part is, those claims are all true, they're just not all true at the same time. The portability claims are true (only if you ignore performance), the performance claims are true (only if you're on x86-64).

jart · on May 17, 2021

We learned yesterday on HN that Java 1.0 was only able to run on two systems: Windows and Solaris. https://github.com/ufuu/Java-JDK10 Cosmopolitan 1.0 runs on Windows, Linux, Mac, FreeBSD, OpenBSD, NetBSD, and boots from BIOS on bare metal all in a single 12kb file. https://justine.lol/cosmopolitan/howfat.html If we consider that Java today is on version 17, then I'd ask you to imagine where Cosmopolitan v17 will be in the future.

robertlagrant · on May 17, 2021

The possibilities are of course endless; I think the discussion is about what is claimed today.

sigjuice · on May 17, 2021

What's with all the nitpicking? I just don't get it. What is claimed today is entirely reasonable.

The landing page says SIMILAR to Java. It doesn't say runs identically on exactly all versions of every system that Java programs will run on.

Also, this is a C binary we are talking about. Fundamentally, a C binary will not run on multiple architectures (at least not without some sort of special translation mechanism like Rosetta or qemu user-mode emulation).

"Modern desktops and servers" is still pretty much x86-64 for most of the world. Yes, I know there are ARM servers and I'm typing this on an M1 Mac and I own nearly all kinds of Raspberry Pi's (even the minor revs of some models).

KMag · on May 17, 2021

> Fundamentally, a C binary will not run on multiple architectures (at least not without some sort of special translation mechanism like Rosetta or qemu user-mode emulation).

The C machine model doesn't dictate native compilation. You could (and in fact IBM OS/400 does[0]) compile C to a binary format that does run on multiple architectures.

Maybe instead of "C binary" you meant "C compiled to native binary", but that linguistic shortcut also encourages a mental shortcut that excludes several interesting design tradeoffs. In the IBM TIMI case, the equivalent of setting the execute bit on the binary triggers OS/400 to perform the native code generation and save that native code to disk, similar to some versions of Android Runtime, but without mandatory garbage collection.

[0] https://en.wikipedia.org/wiki/IBM_i#TIMI

sigjuice · on May 17, 2021

Yes, C does not dictate native compilation. In fact, C doesn't even dictate compilation. There are various C interpreters, aren't there?

KMag · on May 17, 2021

Yes, the MIT 6.270 robot competition in January 1998 definitely ran the robot using a C interpreter on an ARM board.

robertlagrant · on May 17, 2021

I wasn't making a point about the claimed error. I was saying "think of what can be done in the future" is not relevant.

tomcam · on May 17, 2021

Hang on. Didn’t the parent post do exactly that?

jart · on May 17, 2021

Some previous discussions on HN here:

- https://news.ycombinator.com/item?id=26271117

- https://news.ycombinator.com/item?id=25556286

- https://news.ycombinator.com/item?id=24256883

motiejus · on May 17, 2021

I have been working with Bazel in my personal monorepo (~6 months now), and the fact that Justine, someone I believe knows Bazel/Blaze well, uses GNU Make, makes me re-think my decision: perhaps I should re-consider it. I have quite a few issues with Bazel, which require significant engineering hours to overcome, whereas I could just use gnu make and solve it quickly.

Let's see if I can get some inspiration of how things can be built from this repo?

`o/$(MODE)/depend`, a makefile-looking file with the full dependency tree. It is compiled using tool/build/mkdeps.c. So we have a Makefile generator (Justine, you mentioned elsewhere in this thread you didn't want to invent a build system? :)) The Makefile generator is very specific to this project: parses C files and creates that tree.

It is damn fast and, so far, beautifully documented (at least the build parts I looked). Hell, even documentation lines in file preambles are 72 characters wide, and justified. Crazy. Do you manually justify those?

You are also vendoring statically-built GCC, and the folder with executables is <10MB. LLVM C/C++ toolchain is hundreds of megs, compressed.

I am certainly taking inspiration of being in tight control of the compiler toolchain, and beautiful documentation. Not sure I will write my Makefile generator, since my project is also not that big.

Thanks. Cosmopolitan is giving me much more to look at than an αcτµαlly pδrταblε εxεcµταblε.

_TwoFinger · on May 17, 2021

> Not sure I will write my Makefile generator, since my project is also not that big.

gcc has a family of command-line options starting with -M than can generate dependencies as a byproduct during compilation. The generated files are in Makefile format, and only need to be included in your main Makefile.

https://make.mad-scientist.net/papers/advanced-auto-dependen...

oaw-bct-ar-bamf · on May 17, 2021

I never understood how anyone can be fine with dynamic build dependencies?

Each and every build tool should be version pinned so that two build runs today and 4 years ago produce the same (hex neutral) executable.

Otherwise debugging issues becomes a nightmare...

gumby · on May 17, 2021

I doubt many would disagree in principle but that's very expensive. You need to maintain old machines too (well, old OSes) as the old compiler may not run the same on newer versions (file locations, shared libraries, etc).

Years ago I had a customer who wanted to freeze a toolset and wanted to be sure that any bug fix they requested changed only the lines of code relevant to the bug (they diffed the binaries and traced each change back to our source change to be sure). They paid an enormous premium for this capability.

devit · on May 18, 2021

All you need is to run the build in a VM (a container/chroot also works almost always since the Linux kernel maintainers are committed to not breaking userspace).

It's not done usually because it's also very useful to take advantage of new language and library features and all software is supposed to not break compatibility with new versions, so if there is an issue due to upgrading it's because an incompetent maintainer failed at their job.

tomcam · on May 17, 2021

I would totally love to read a blog post about this. One can imagine an embedded tools gig where this would be important for a long-term support. Did they also do something like buy multiple copies of A specific hardware configuration of the target computer for preservation?

gumby · on May 17, 2021

I don't remember what they did on the host (development) side. This was more than 25 years ago.

They were a major phone switch manufacturer (long since absorbed by someone else). Their original design was, IIRC, a Z8000. As those parts were EOLed they shifted to the 68K and wrote a Z8000 emulator for the 68K. They later shifted to the PPC and ported their Z8K emulator to the PPC. We supplied a frozen version of the GCC PPC cross compiler. They had their own frozen version of a Z8K toolchain I think.

Their SLA was something like "less than five minutes of downtime per decade" -- no rebooting, realtime performance, no other interruption -- and they believed their extreme conservatism helped them get there.

tomcam · on May 17, 2021

Hold on, I just realized something.

Does this mean their hardware may still be running on a Power PC that is emulating a Z8000 that is emulating a 68000?

gumby · on May 17, 2021

It would have been "running on a Power PC that is emulating a 68K that is emulating a Z8000". We used to joke about that.

But in fact they changed from "running on a 68K that is emulating a Z8000" to "running on a Power PC that is emulating a Z8000".

They ported their Z8K emulator from 68K to PPC which wasn't super hard. When the PPC was designed it was planned as an upgrade path from 68K series (remember it was a JV between Apple, Motorola and IBM; the first two, at least, had vested interests in making that transition as easy as possible).

This whole stack of emulation sounds crazy but given their needs, it wasn't.

tomcam · on May 17, 2021

Thanks for scratching my itch. Fun fun story with just the right ending.

tomcam · on May 17, 2021

> "less than five minutes of downtime per decade

Wow.

Well, if they were willing to pay…

Thanks very much for that wonderful piece of history.

oaw-bct-ar-bamf · on May 17, 2021

The web in general may be fine with this. But in the real-time domain (planes, cars, machinery, assembly lines...) you cannot allow uncertain behaviour caused by a shaky build environment

Spivak · on May 17, 2021

You say the web like this isn’t also true of basically every Linux disto in existence. The ability to build software against many different sets of libraries is what enables the Linux userspace to work.

jart · on May 17, 2021

> I never understood how anyone can be fine with dynamic build dependencies?

There's 132,059 lines of Makefile code that's generated to o/$(MODE)/depend e.g.

    o//libc/stubs/gcov.o: \
            libc/stubs/gcov.S \
            libc/macros.internal.h \
            libc/macros.internal.inc \
            libc/macros-cpp.internal.inc \
            ape/relocations.h

That much code can't be written by hand, and if you don't write that, then your build targets won't be invalidated correctly. You'll end up with a non-deterministic unreliable build, which is much worse than generating some unfancy make that causes the make process to bootstrap itself. Goal is to get those hex perfect reproducible binaries with minimal toil.

Also, when you write build configs, do you depend on system-provided tools and libraries? Such as some .so file or the python interpreter? The cosmopolitan mono repo doesn't do that. It currently only requires the make, sh, zip, mv, rm, touch and gzip commands. I'd ideally like to make it more hermetic but so far that hasn't been an issue, since the above tools are so stable.

lima · on May 17, 2021

> I have been working with Bazel in my personal monorepo (~6 months now), and the fact that Justine, someone I believe knows Bazel/Blaze well, uses GNU Make, makes me re-think my decision: perhaps I should re-consider it. I have quite a few issues with Bazel, which require significant engineering hours to overcome, whereas I could just use gnu make and solve it quickly.

It simply represents different tradeoffs of convenience vs. correctness. With Bazel, you get correctness but you pay a complexity price.

ithkuil · on May 17, 2021

Have you tried https://redo.readthedocs.io/ ?

numlock86 · on May 17, 2021

> Please note that your APE binary will assimilate itself as a conventional resident of your platform after the first run, so it can be fast and efficient for subsequent executions.

Is my understanding correct that the binary changes itself when first run on the target platform? That sounds like it will trigger a lot of red lights with many automated defensive mechanisms like anti virus.

GordonS · on May 17, 2021

Hmm, on Windows it's common to install binaries to %PROGRAMFILES%, precisely so that non-admins cannot modify them. And in general, users having write access to binaries is frowned upon pretty much everywhere.

What I'd like is 2 mechanisms:

First, the ability to disable this optimisation altogether.

Second, a means to run this optimisation as a distinct step, without executing the rest of the binary. For example, if the "--optimize" flag is used, do the optimisation and then exit.

jart · on May 17, 2021

On Windows APE doesn't need to modify itself because it's an MZ executable. On UNIX the MZ is the prologue to a shell script, which calls printf 'ELF HEADER' >$0; exec $0 to make it conform to the local convention. For subsequent executions it won't need to self-modify so you can simply prime the ape binary once before copying it to /usr/bin if you're an admin. Think of it like a zero-step install wizard that takes a few microseconds.

emidoots · on May 17, 2021

Yes, I think that's correct. However, see: https://github.com/jart/cosmopolitan/issues/90

jart · on May 17, 2021

Author here. The people who make virus scanners are going to have their heads explode when they learn about redbean's new StoreAsset() function https://justine.lol/redbean/index.html#StoreAsset where the executable edits itself as though it were MongoDB. It took a pretty heroic hack to work around ETXTBSY where the executable unmaps itself from memory after loading and then remaps itself so it can restore the original APE header and modify the ZIP central directory. https://github.com/jart/cosmopolitan/blob/b8f38cf55d84bac7d7... That said, I still upvote on VirusTotal each executable I publish, so you can verify they came from me, provided the executable is byte-for-byte identical to its original form. After you've modified it, it becomes yours, and you can upload your modified version to VirusTotal too and upvote that. In the future, I'd like to have smarter tools for verifying the authenticity of the non-zip bits of APE binaries, but that will come with time.

jcelerier · on May 17, 2021

How does that work with Apple's notarization which stores the signature and a hash of your program in Apple's servers ?

saagarjha · on May 18, 2021

Presumably com.apple.security.cs.disable-executable-page-protection or similar

gravypod · on May 17, 2021

As your repo grows do you think you'll eventually need to implement some blaze-like tooling? In previous treads you mentioned compiling the entire repo only takes a minute or two but I'm wondering if the reproducability guarantees would help an effort like this.

jart · on May 17, 2021

Author here. Believe it or not, I'm also the author of some of Blaze's coolest features, like its downloader, which reduced TensorFlow build flakes down from 10% to 0%. https://github.com/bazelbuild/bazel/commit/92887a08a55be96ec... The Cosmopolitan Makefile already has Blaze-like tooling. For example, it does strict dependency checking to verify the DEPS list for each static archive specifies all of its directly-reachable dependencies. https://github.com/jart/cosmopolitan/blob/b8f38cf55d84bac7d7... See example build config here: https://github.com/jart/cosmopolitan/blob/b8f38cf55d84bac7d7... That was actually one of the first features I wrote for the Cosmopolitan codebase, because it has ~67 static archive .a files, and making sure those all get linked in the correct order and are acyclic would have been nearly impossible without such a tool.

I'm sure in the future people will want a more Pythonic build syntax, since with GNU Make it's a bit easy to shoot oneself in the foot. The biggest issue contributors have had so far with the build is that we need to specify in the top-level Makefile a correctly topologically-ordered list of `include foo.mk` lines. Otherwise some pretty counterintuitive errors happen. But one grows used to it after a little pain and the repo feels like second nature. Ultimately, I just really didn't want to invent yet another build system. I'm pretty proud of the fact that (at least for now) I've managed to make GNU Make work so well for such a large repo. Google themselves actually used GNU Make for their codebase until around ~2005 so I'm hoping Cosmopolitan has got at least another decade of use in it.

rockwotj · on May 17, 2021

> I'm sure in the future people will want a more Pythonic build syntax

> Ultimately, I just really didn't want to invent yet another build system.

Is there a reason not to use Bazel initially or in the future when you want a Python syntax?

jart · on May 17, 2021

No comment.

barosl · on May 17, 2021

> https://justine.lol/redbean/index.html

> All you need to do is download the redbean.com program below, change the filename to .zip, add your content in a zip editing tool, and then change the extension back to .com.

> That performance is thanks to zip and gzip using the same compression format, which enables kernelspace copies.

Oh my. Having a web server executable which is a zip archive at the same time is a lovely idea. Have there been any other attempts similar to this?

bayindirh · on May 17, 2021

PoC||GTFO Magazine's almost every issue is multi format: https://www.alchemistowl.org/pocorgtfo/

From the latest issue:

Technical Note:The electronic edition of this magazine is valid as both PDF and ZIP. The PDF has been cryptographically signed with a factored private key for the TI 83+ graphing calculator.

twic · on May 17, 2021

JAR files are ZIP archives, so any Java web server packed into a single JAR is such a thing, to an extent. I don't know of any server which actually makes use of that fact to serve compressed content, but it wouldn't be hard to do.

barbarbar · on May 17, 2021

But that needs a JVM (and maintenance of that). Which is why redbean/cosmopolitan is so extremely clever.

jart · on May 17, 2021

You'll never need to worry about showing your users how to uninstall the Ask Toolbar that came bundled with the Cosmopolitan Virtual Machine that needs to be installed before your users can run your APE binaries, because there isn't one. This project grants your software autonomy without any toilsome tradeoffs. Enjoy!

adsharma · on May 17, 2021

I think there are a number of messages here:

  * Java is slowing our tools down
  * amalgamation sqlite style is showing the benefits of tightly written, standalone good old C.
  * By extension: monorepos are introducing complexity, since they depend on JVM for blaze/bazel.
  * x86 is pervasive in our industry

People might also want to consider the cost/benefit trade-off for binary vs source compatibility. If you can code in a programming language that works across platforms, can be readily transpiled to one of the supported statically typed languages with a robust, small and fast toolchain, you have any number of packagers who can quickly make binaries for your platform of interest that makes it convenient to install.

You get the benefit of better static analysis vs good old C.

tomcam · on May 17, 2021

You have to love this comment partway through the Linux/BSD deployment instructions, which read slightly more like the Borg manual than compilation docs:

> Please note that your APE binary will assimilate itself as a conventional resident of your platform after the first run...

emidoots · on May 17, 2021

I wonder if anyone has tried using this with Zig (or Rust) yet? I think it'd be quite cool to have this as an out-of-the-box option one can just flip

girvo · on May 17, 2021

It should be possible to get it working with Nim easily too, interestingly. I'll have a play with it tonight

planetis · on May 17, 2021

already done: https://github.com/Yardanico/cosmonim

girvo · on May 17, 2021

Hah, of course it is! And it’s about as easy as I expected too. I got a PoC working nicely, but this looks a lot nicer than my work, with its asyncserv patch

nicoburns · on May 17, 2021

Wouldn't it be nice if operating systems just collaborated on a shared executable format (that would just wrap the platform native ones) rather than having to hack around this with self-modifying code.

delusional · on May 17, 2021

The executable format is not the hard part of cross platform executable. It's pretty neat that Cosmopolitan has managed to include a cross platform loader within the executable itself, but the real magic is a cross platform standard library.

smallstepforman · on May 17, 2021

Well, we cannot get the various Linux distros to agree on base systems, even though they use the same kernel. There are at least 4 flavours of BSD. The irony is that poor little Haiku as a single unified system ends up having a larger community than a lot of the perceived larger, yet segregated distros.

oblio · on May 17, 2021

They'd never admit it explicitly, but OSes are actually fighting each other. Sometimes in a friendlier fashion, sometimes not.

No way they'd agree on a common binary format.

saagarjha · on May 17, 2021

Nit: XNU's version numbers number in the thousands, it's Darwin that is in the teens/early twenties. Darwin 15.6 came out in 2016, not 2018.

3v1n0 · on May 17, 2021

How can this work in sandbox distributions? For example in a snap or flatpak package where a binary is in a read only mode and with limited permissions?

lifthrasiir · on May 17, 2021

There is a compile option named APE_NO_MODIFY_SELF which writes and launches a copy of itself into the tmpdir instead of self-modifying (with a huge caveat).

nonameiguess · on May 17, 2021

Her comment here acknowledges this, but this will cause problems when /tmp is mounted noexec, which is fairly common for servers: https://github.com/jart/cosmopolitan/blob/da8a08fd58324a87f6...

I suppose this level of portability is more a feature if you're shipping to PCs anyway, though. If you're deploying to servers, you know the arch and OS ahead of time and there is no obvious downside I can think of to just targeting it directly.

mikepurvis · on May 17, 2021

Well, or if you could do it during commissioning, like have a local Ansible step that pre-flavours the executable for the target, then pushes it over.

stabbles · on May 17, 2021

Is there any way to use cosmopolitan with shared libraries? In some cases licensing doesn't allow one to make a single executable.

SXX · on May 17, 2021

In what case licensing is a problem? For LGPL at least you can just provide object files and this will satisfy all requirements since user will be able to link any library version.

tomcam · on May 17, 2021

Going from strength to strength in one ridiculously small package. It is thrilling to watch this runtime evolve.

gigatexal · on May 17, 2021

I had no idea Windows NT stood for New Technology. Also, this looks really, really interesting. This is a surely a project I will watch as until now I'd not known about it.

YesThatTom2 · on May 17, 2021

What do they mean by BIOS/UEFI support? Bare metal?

hvis · on May 17, 2021

> GNU/Systemd

That's funny.

nabla9 · on May 17, 2021

Changes...

.. Add Fabrice Bellard's JavaScript engine to third party

.. Add SQLite to third party

dingdingdang · on May 17, 2021

Along with https://justine.lol/redbean/index.html (also in cosmo distro now) this means full offline gui apps can be written to run across all major plaforms. Very impressive in my opinion :)

sunmountain · on May 17, 2021

Just awesome (tm).

1vuio0pswjnm7 · on May 17, 2021

Only major downside of all this amazing portability work I can see is that it seems to be owned by an online advertising services corporation (the author's employer).

jart · on May 17, 2021

Author here. I haven't worked for them for years. If you're concerned about ownership see https://justine.lol/cosmopolitan/license.html

1vuio0pswjnm7 · on May 17, 2021

When it says "Support Vector" does that just mean the version that is being used to test or does that mean, e.g., this will not work on pre-2018/2020 BSD. Is there a GNU make dependancy and if so, could it be eliminated so other makes would work as well.

nabla9 · on May 17, 2021

Why you think so?

It says in the license:

ISC License (same as MIT or BSD with unnecessary text removed)