OCaml Planet
The OCaml Planet aggregates various blogs from the OCaml community. If you would like to be added, read the Planet syndication HOWTO.




MirageOS 4 Released! — Tarides, Mar 29, 2022
Tarides is delighted to announce that MirageOS 4 is finally released! As core contributors to the project, we are proud to have been part of the journey to 4.0.
What is MirageOS? MirageOS is a library operating system that constructs unikernels for fast and secure network applications that work across a variety of cloud computing and mobile platforms. The goal of MirageOS is to give the individual control of their own data and take back control of their privacy.
It achieves these goals in severa…
Read more...Tarides is delighted to announce that MirageOS 4 is finally released! As core contributors to the project, we are proud to have been part of the journey to 4.0.
What is MirageOS? MirageOS is a library operating system that constructs unikernels for fast and secure network applications that work across a variety of cloud computing and mobile platforms. The goal of MirageOS is to give the individual control of their own data and take back control of their privacy.
It achieves these goals in several ways, from securely deploying static website hosting with Let’s Encrypt certificate provisioning and a secure SMTP stack, to ensuring data privacy with decentralised communication infrastructures like Matrix, OpenVPN Servers, and TLS tunnels, as well as using DNS(SEC) Servers for better authentication.
Over the years since its first release in 2013, the Mirage ecosystem has grown to include hundreds of libraries and service millions of daily users, along with several major commercial users that rely on MirageOS to keep their code secure. Examples of this include Docker Desktop’s VPNkit, the Citrix Hypervisor, as well as Robur, Nitrokey, and Tarides itself!
What’s in the New Release?
The new release focuses on better integration with existing ecosystems. For example, it is now much easier to integrate with existing OCaml libraries, as MirageOS 4 is now using dune
to build unikernels.
There has also been a major change in how MirageOS compiles projects with the introduction of a new tool called opam-monorepo
that separates package management from building the resulting source code. The Opam plugin can create a lock file for project dependencies, download and extract dependency sources locally, and even set up a Dune workspace, which then enables dune build
to build everything simultaneously.
The new release also adds systematic support for cross-compilation to all supported unikernel targets, meaning that libraries that use C stubs can now have those stubs seamlessly cross-compiled to a desired target.
To find out more about the new release please read the official release post on Mirage.io.
Keep an eye on mirage.io's blog over the next two weeks for more posts on the exciting new things that come with MirageOS 4.0, starting with “Introduction to Build Contexts in MirageOS 4.0” tomorrow!
Hide



PhD Position at CEA LIST - LSL — Frama-C, Mar 28, 2022




All your metrics belong to influx — Hannes Mehnert (hannes), Mar 08, 2022
Introduction to monitoring
At robur we use a range of MirageOS unikernels. Recently, we worked on improving the operations story thereof. One part is shipping binaries using our reproducible builds infrastructure. Another part is, once deployed we want to observe what is going on.
I first got into touch with monitoring - collecting and graphing metrics - with MRTG and munin - and the simple network management protocol SNMP. From the whole system perspective, I find it crucial that the monitoring…
Read more...Introduction to monitoring
At robur we use a range of MirageOS unikernels. Recently, we worked on improving the operations story thereof. One part is shipping binaries using our reproducible builds infrastructure. Another part is, once deployed we want to observe what is going on.
I first got into touch with monitoring - collecting and graphing metrics - with MRTG and munin - and the simple network management protocol SNMP. From the whole system perspective, I find it crucial that the monitoring part of a system does not add pressure. This favours a push-based design, where reporting is done at the disposition of the system.
The rise of monitoring where graphs are done dynamically (such as Grafana) and can be programmed (with a query language) by the operator are very neat, it allows to put metrics in relation after they have been recorded - thus if there's a thesis why something went berserk, you can graph the collected data from the past and prove or disprove the thesis.
Monitoring a MirageOS unikernel
From the operational perspective, taking security into account - either the data should be authenticated and integrity-protected, or being transmitted on a private network. We chose the latter, there's a private network interface only for monitoring. Access to that network is only granted to the unikernels and metrics collector.
For MirageOS unikernels, we use the metrics library - which design shares the idea of logs that only if there's a reporter registered, work is performed. We use the Influx line protocol via TCP to report via Telegraf to InfluxDB. But due to the design of metrics, other reporters can be developed and used -- prometheus, SNMP, your-other-favourite are all possible.
Apart from monitoring metrics, we use the same network interface for logging via syslog. Since the logs library separates the log message generation (in the OCaml libraries) from the reporting, we developed logs-syslog, which registers a log reporter sending each log message to a syslog sink.
We developed a small library for metrics reporting of a MirageOS unikernel into the monitoring-experiments package - which also allows to dynamically adjust log level and disable or enable metrics sources.
Required components
Install from your operating system the packages providing telegraf, influxdb, and grafana.
Setup telegraf to contain a socket listener:
[[inputs.socket_listener]]
service_address = "tcp://192.168.42.14:8094"
keep_alive_period = "5m"
data_format = "influx"
Use a unikernel that reports to Influx (below the heading "Unikernels (with metrics reported to Influx)" on builds.robur.coop) and provide --monitor=192.168.42.14
as boot parameter. Conventionally, these unikernels expect a second network interface (on the "management" bridge) where telegraf (and a syslog sink) are running. You'll need to pass --net=management
and --arg='--management-ipv4=192.168.42.x/24'
to albatross-client-local.
Albatross provides a albatross-influx
daemon that reports information from the host system about the unikernels to influx. Start it with --influx=192.168.42.14
.
Adding monitoring to your unikernel
If you want to extend your own unikernel with metrics, follow along these lines.
An example is the dns-primary-git unikernel, where on the branch future
we have a single commit ahead of main that adds monitoring. The difference is in the unikernel configuration and the main entry point. See the binary builts in contrast to the non-monitoring builts.
In config, three new command line arguments are added: --monitor=IP
, --monitor-adjust=PORT
--syslog=IP
and --name=STRING
. In addition, the package monitoring-experiments
is required. And a second network interface management_stack
using the prefix management
is required and passed to the unikernel. Since the syslog reporter requires a console (to report when logging fails), also a console is passed to the unikernel. Each reported metrics includes a tag vm=<name>
that can be used to distinguish several unikernels reporting to the same InfluxDB.
Command line arguments:
let doc = Key.Arg.info ~doc:"The fingerprint of the TLS certificate." [ "tls-cert-fingerprint" ] in
Key.(create "tls_cert_fingerprint" Arg.(opt (some string) None doc))
+let monitor =
+ let doc = Key.Arg.info ~doc:"monitor host IP" ["monitor"] in
+ Key.(create "monitor" Arg.(opt (some ip_address) None doc))
+
+let monitor_adjust =
+ let doc = Key.Arg.info ~doc:"adjust monitoring (log level, ..)" ["monitor-adjust"] in
+ Key.(create "monitor_adjust" Arg.(opt (some int) None doc))
+
+let syslog =
+ let doc = Key.Arg.info ~doc:"syslog host IP" ["syslog"] in
+ Key.(create "syslog" Arg.(opt (some ip_address) None doc))
+
+let name =
+ let doc = Key.Arg.info ~doc:"Name of the unikernel" ["name"] in
+ Key.(create "name" Arg.(opt string "ns.nqsb.io" doc))
+
let mimic_impl random stackv4v6 mclock pclock time =
let tcpv4v6 = tcpv4v6_of_stackv4v6 $ stackv4v6 in
let mhappy_eyeballs = mimic_happy_eyeballs $ random $ time $ mclock $ pclock $ stackv4v6 in
Requiring monitoring-experiments
, registering command line arguments:
package ~min:"3.7.0" ~max:"3.8.0" "git-mirage";
package ~min:"3.7.0" "git-paf";
package ~min:"0.0.8" ~sublibs:["mirage"] "paf";
+ package "monitoring-experiments";
+ package ~sublibs:["mirage"] ~min:"0.3.0" "logs-syslog";
] in
foreign
- ~keys:[Key.abstract remote_k ; Key.abstract axfr]
+ ~keys:[
+ Key.abstract remote_k ; Key.abstract axfr ;
+ Key.abstract name ; Key.abstract monitor ; Key.abstract monitor_adjust ; Key.abstract syslog
+ ]
~packages
Added console and a second network stack to foreign
:
"Unikernel.Main"
- (random @-> pclock @-> mclock @-> time @-> stackv4v6 @-> mimic @-> job)
+ (console @-> random @-> pclock @-> mclock @-> time @-> stackv4v6 @-> mimic @-> stackv4v6 @-> job)
+
Passing a console implementation (default_console
) and a second network stack (with management
prefix) to register
:
+let management_stack = generic_stackv4v6 ~group:"management" (netif ~group:"management" "management")
let () =
register "primary-git"
- [dns_handler $ default_random $ default_posix_clock $ default_monotonic_clock $
- default_time $ net $ mimic_impl]
+ [dns_handler $ default_console $ default_random $ default_posix_clock $ default_monotonic_clock $
+ default_time $ net $ mimic_impl $ management_stack]
Now, in the unikernel module the functor changes (console and second network stack added):
@@ -4,17 +4,48 @@
open Lwt.Infix
-module Main (R : Mirage_random.S) (P : Mirage_clock.PCLOCK) (M : Mirage_clock.MCLOCK) (T : Mirage_time.S) (S : Mirage_stack.V4V6) (_ : sig e
nd) = struct
+module Main (C : Mirage_console.S) (R : Mirage_random.S) (P : Mirage_clock.PCLOCK) (M : Mirage_clock.MCLOCK) (T : Mirage_time.S) (S : Mirage
_stack.V4V6) (_ : sig end) (Management : Mirage_stack.V4V6) = struct
module Store = Irmin_mirage_git.Mem.KV(Irmin.Contents.String)
module Sync = Irmin.Sync(Store)
And in the start
function, the command line arguments are processed and used to setup syslog and metrics monitoring to the specified addresses. Also, a TCP listener is waiting for monitoring and logging adjustments if --monitor-adjust
was provided:
module D = Dns_server_mirage.Make(P)(M)(T)(S)
+ module Monitoring = Monitoring_experiments.Make(T)(Management)
+ module Syslog = Logs_syslog_mirage.Udp(C)(P)(Management)
- let start _rng _pclock _mclock _time s ctx =
+ let start c _rng _pclock _mclock _time s ctx management =
+ let hostname = Key_gen.name () in
+ (match Key_gen.syslog () with
+ | None -> Logs.warn (fun m -> m "no syslog specified, dumping on stdout")
+ | Some ip -> Logs.set_reporter (Syslog.create c management ip ~hostname ()));
+ (match Key_gen.monitor () with
+ | None -> Logs.warn (fun m -> m "no monitor specified, not outputting statistics")
+ | Some ip -> Monitoring.create ~hostname ?listen_port:(Key_gen.monitor_adjust ()) ip management);
connect_store ctx >>= fun (store, upstream) ->
load_git None store upstream >>= function
| Error (`Msg msg) ->
Once you compiled the unikernel (or downloaded a binary with monitoring), and start that unikernel by passing --net:service=tap0
and --net:management=tap10
(or whichever your tap
interfaces are), and as unikernel arguments --ipv4=<my-ip-address>
and --management-ipv4=192.168.42.2/24
for IPv4 configuration, --monitor=192.168.42.14
, --syslog=192.168.42.10
, --name=my.unikernel
, --monitor-adjust=12345
.
With this, your unikernel will report metrics using the influx protocol to 192.168.42.14 on port 8094 (every 10 seconds), and syslog messages via UDP to 192.168.0.10 (port 514). You should see your InfluxDB getting filled and syslog server receiving messages.
When you configure Grafana to use InfluxDB, you'll be able to see the data in the data sources.
Please reach out to us (at team AT robur DOT coop) if you have feedback and suggestions.
Hide



Secure Virtual Messages in a Bottle with SCoP — Tarides, Mar 08, 2022
People love to receive mail, especially from loved ones. It’s heartwarming to read each word as their thoughts touch our deepest feelings. Now imagine someone else reading those private sentiments, like a postal worker. Imagine how violated they’d feel if their postal carrier handed them an open letter with a knowing smile. Of course, people trust that postal employees won’t read their personal correspondence; however, they regularly risk their privacy when sending emails, images, and mess…
Read more...People love to receive mail, especially from loved ones. It’s heartwarming to read each word as their thoughts touch our deepest feelings. Now imagine someone else reading those private sentiments, like a postal worker. Imagine how violated they’d feel if their postal carrier handed them an open letter with a knowing smile. Of course, people trust that postal employees won’t read their personal correspondence; however, they regularly risk their privacy when sending emails, images, and messages.
Around 300 billion emails traverse the Internet every single day. They travel through portals with questionable security, and the messages often contain private or sensitive data. Most online communication services are composed of multiple components with complex interactions. If anything goes wrong, it results in critical security incidents. This leaves an unlocked door for malicious hackers to breach private information for profit or just for fun. Since it takes considerable technical skills and reliable infrastructure to operate a secure email service, most Internet users must rely on third-parties operators. In practice, there are only a few large companies that can handle communications with the proper security levels. Unfortunately for regular people, these companies profit from mining their personal data. Due to this global challenge, Tarides focused their efforts to address these issues and find solutions to protect both personal and professional data.
An Innovative Solution
Our work resulted in the project "Secure-by-Design Communications Protocols" (SCoP), a secure, easily deployable solution to preserve users' privacy. In essence, SCoP puts your messages in a secure, virtual ‘bottle’ to protect it from invasive actions. This bottle represents a secure architecture using type-safe languages and unikernels for both email and instant messaging. We mould unikernels (specialised applications that run on a VM) into refined meshes linked by TLS-firm communication pipes, as depicted in the image below.
The SCoP virtual bottle creates a trustworthy information flow where dedicated unikernels ensure security for communication from origin to destination. We carefully design every component of SCoP as independent libraries, using modern development techniques to avoid the common reported threats and flaws. The OCaml-based development enables this safe online environment, which eliminates many exploited security pitfalls. Moreover, our SCoP project comes with energy-efficient consumption provided by the lightweight and low-latency design components.
We mostly focused on the sender’s side, securing the message inside the SCoP bottle. For instant messages, we created a capsule with a Matrix client library, and for emails we based our bottle on the SMTP protocol and Mr. MIME. For further protection, we developed the bottle’s ‘cork’ with the Hamlet email corpus.
The SCoP Processes
First, we generated Hamlet, a collection of emails to test our parser implementation against existing projects, to ensure that they kept equivalence between the encoder and decoder. After we successfully parsed and encoded one million emails, we used Hamlet to stress-test our SMTP stack.
Secondly, we created an SMTP extension mechanism and support for SPF, including an implementation for DMARC, a security framework in addition to DKIM and SPF. We completed four components: SPF, DKIM, SMTP, and Mr. MIME, which can generate a correctly-signed email, signatures, and the DKIM field containing the signatures.
In essence, we designed the SMTP sender bottle with a mesh of unikernels connected via secured communication pipes. The SMTP Submission Server unikernel receives the sender’s authentication credentials against the secured database maintained by Irmin. After it confirms the credentials, it sends the email for sealing (via a TLS pipe) to the DKIM signer. Then the DKIM signer unikernel, responsible for handling IP addresses, communicates via the nsupdate protocol with the Primary DNS Server. The DKIM signer places the sender’s and receiver’s addresses on the email, seals it with the DKIM signature, and sends it to the SMTP relay for distribution. The SMTP relay unikernel communicates with the DNS resolver unikernel to locate the receiver by the DNS name, then it coordinates this location with the Irmin database to verify the authorization according to the SPF protocol. After all these checks have passed, the signed and sealed email is secured in the SCoP bottle and launched through Cyberspace.
Next, we developed the Matrix protocol’s client library, and we used it to enable notifications from the CI system, testing all the new OCaml packages. We also designed an initial PoC for a Matrix’s server-side daemon.
We made significant progress in deploying DNSSEC, a set of security extensions over DNS. While we completed our first investigation into the DNSSEC prototype, we also discovered several issues, so we addressed those as lessons learned.
Finally, we completed the SCoP bottle with the email receiver, which Spamtacus (the Bayesian spam filter) guards against spam intruders. Furthermore, the OCaml-Matrix server represents our solution to take care of the instant communication in the Matrix federation.
A Secure-by-Design SMTP Stack
We researched state-of-the-art email spam filtering methods and identified machine learning as the main trend. We followed this path and equipped our email architecture with a spam-filter unikernel, which uses a Bayesian method for supervised learning of spam and acts as a proxy for internet communication in the SMTP receiver. This spam filter works in two states: preparation, where the unikernel detects spam, and operation, where the unikernel integrates into the SMTP receiver unikernel architecture to filter spam emails. Our spam-filter unikernel can also be used independently as an individual anti-spam tool to help enforce GDPR rules and protect the user’s privacy by preventing spam-induced attacks, such as phishing.
We integrated our spam filter into a unikernel positioned at the beginning of the SMTP receiver stack. This acts as a first line of defence in an eventual attack targeting the receiver in order to maintain functionality. The spam-filter unikernel can be extended to act as an antivirus by analysing the email attachment for certain features known to characterise malware. We’ve already set the premises for the antivirus by using a prototype analysis of the email attachments. Moreover, the spam-filter unikernel can contribute with a list of frequent spammers to the firewall, which we plan to add into the SMTP receiver as the next step in our development of SCoP.
How the Technology Works
DKIM, SPF, and DMARC are three communication protocols meant to ensure email security by verification of sender identity. The latest RFC standards for DKIM, SPF, and DMARC are RFC8463, RFC7208, and RFC7489, respectively.
DKIM provides a signer protocol and the associated verifier protocol. DKIM signer allows the sender to communicate which email it considers legitimate. Our implementation of the DKIM verifier is associated with the SMTP receiver, it follows the RFC8463 standard and supports the ED25519 signing algorithm, i.e., the elliptic curve cryptography generated from the formally verified specification in the fiat project from MIT.
SPF is an open standard that specifies a method to identify legitimate mail sources, using DNS records, so the email recipients can consult a list of IP addresses to verify that emails they receive are from an authorised domain. Hence, SPF is functioning based on the blacklisting principle in order to control and prevent sender fraud. Our implementation of the SPF verifier follows the RFC7208 standard.
DMARC (Domain-based Message Authentication, Reporting, and Conformance) enables a sender to indicate that their messages comply with SPF and DKIM, and applies clear instructions for the recipient to follow if an email does not pass SPF or DKIM authentications (reject, junk, etc.). As such, DMARC is used to create domain reputation lists, which can help determine the actual email source and mitigate spoofing attacks. Our implementation of the DMARC verifier is integrated in the SMTP receiver and follows the RFC7489 standard.
Our secure-by-design SMTP stack contains the DKIM/SPF/DMARC verifier unikernel on the receiver side. This unikernel verifies the email sender’s DNS characteristics via a TLS communication pipe, and in case the DNS verification passes, the spam-labelled email goes to the SMTP relay to be dispatched to the email client. However, in case the DNS verification doesn’t pass, we can use the result to construct a DNS reputation list to improve the SMTP security via a blacklisting firewall.
Matrix Server
The Matrix server in our OCaml Matrix implementation manages clients who are registered to rooms that contain events. These represent client actions, such as sending a message. Our implementation follows the Matrix specification standard. From here, we extracted the parts describing the subset of the Matrix components we chose to implement for our OCaml Matrix server MVP. The OCaml implementation environment provides secure-by-design properties and avoids various vulnerabilities, like the buffer overflow recently discovered that produces considerable information disclosure in other Matrix implementations, e.g., Element.
The Matrix clients are user applications that connect to a Matrix server via the client-server API. We implemented an OCaml-CI client, which communicates with the Matrix servers via the client-server API and tested the integration of the OCaml-CI communication with both Synapse and our OCaml Matrix server. Please note that our OCaml Matrix server supports a client authentication mechanism based on user name identification and password, according to the Matrix specification for authentication mechanisms.
Spam Filter
We researched the state of the art in email spam filtering and we identified machine learning as the main trend. We follow this trend and we equip our email architecture with a spam filter unikernel, which uses a Bayesian method for supervised learning of spam and acts as a proxy to the internet communication in the SMTP receiver. The spam filter implementation works in two stages: preparation, when the unikernel is trained to detect spam, and operation, when the unikernel is integrated into the SMTP receiver architecture of unikernels to filter the spam emails. It is worth mentioning that the spam filter unikernel can be used independently as an individual anti-spam tool to help enforce the GDPR rules and protect the user's privacy by preventing spam induced attacks such as phishing.
We integrate the spam filter into an unikernel positioned at the beginning of the SMTP receiver stack as the first line of defence in an eventual attack targeting the receiver. In this situation, the unikernel format provides isolation of the attack and allows the SMTP receiver to maintain functionality. The spam filter unikernel can be extended to act as an antivirus by analysing the email attachment for certain features that are known to characterise malware. We have already set the premises for the antivirus by a prototype analysis of the email attachments. Moreover, the spam filter unikernel could contribute with a list of frequent spammers to the firewall, which is planned to be added into the SMTP receiver, as the next step in future work.
The DAPSI Initiative
Much of the SCoP project was possible thanks to the DAPSI initiative. They gave Tarides the incentive to further explore an open and secure infrastructure for communication protocols, especially emails. First, DAPSI supported our team by providing necessary financing, but their contribution to our project’s prosperity runs much deeper than funding. DAPSI facilitated multiple coaching sessions that helped broaden our horizons and established reachable goals. Notably, their business coaching enabled us to identify solutions for our market strategy. Their technical coaching and training offered access to data portability experts and GDPR regulations, which opened our perspective to novel trends and procedures. Additionally, DAPSI helped raise our visibility by organising public communications, and DAPSI’s feedback revealed insights on how to better exploit our project’s potential and what corners of the cyber-ecosystem to prioritise. We are deeply grateful to DAPSI for their support and backing, and we’re thrilled to have passed Phase 2!
Up Next for SCoP
We’re excited to further develop this project. We’ll be experimenting with deploying unikernels on a smaller chipset, such as IoT. We’d also like to research secure data porting in other domains such as journalism, law, or banking.
Of course we’ll be maintaining each of the SCoP components in order to follow the latest available standards and state-of-the-art technology, including periodical security analyses of our code-base and mitigation for newly discovered vulnerabilities.
As in all of our work at Tarides, we strive to benefit the entire OCaml community and beyond. Please find more information on SCoP through our blog posts: DAPSI Initiative and DAPSI Phase 1.
Hide



Research internships in our Tools and Compilers group — Jane Street, Mar 04, 2022
We are excited to announce research internships in our Tools and Compilers group.




Segfault Systems Joins Tarides — Tarides, Mar 01, 2022
We are delighted to announce that Segfault Systems, a spinout from IIT-Madras, is joining Tarides. Tarides has worked closely with Segfault Systems over the last couple of years, most notably on the award-winning Multicore OCaml project and the upstreaming plans for OCaml 5.0. This alliance furthers the goals of Tarides, bringing the compiler and benchmarking expertise of the Segfault team directly into the Tarides organisation.
KC Sivaramakrishnan, CEO & CTO of Segfault Systems says that …
Read more...We are delighted to announce that Segfault Systems, a spinout from IIT-Madras, is joining Tarides. Tarides has worked closely with Segfault Systems over the last couple of years, most notably on the award-winning Multicore OCaml project and the upstreaming plans for OCaml 5.0. This alliance furthers the goals of Tarides, bringing the compiler and benchmarking expertise of the Segfault team directly into the Tarides organisation.
KC Sivaramakrishnan, CEO & CTO of Segfault Systems says that “Segfault Systems was founded to secure the foundations of scalable systems programming in OCaml. We have successfully incorporated cutting-edge research on concurrent and parallel programming into OCaml. This addresses the long-standing need of OCaml developers to utilise the widely available multicore processing on modern machines. Tarides is at the forefront of OCaml developer tooling and platform support, and we are excited to join the team to make OCaml the best tool for industrial-strength concurrent and parallel programming.”
“We’re thrilled to have the Segfault Systems team join Tarides,” says Thomas Gazagnaire, CTO of Tarides. “They have been integral to the success of the Multicore OCaml project, which has combined cutting edge research and engineering with consistent communication, promoting Multicore OCaml as an upstream candidate to the core developer team, as well as publishing monthly reports for the wider community. We look forward to working with our new partners to make OCaml the tool of choice for developers.”
All of Segfault Systems’ existing responsibilities and open-source commitments will migrate over to Tarides, where work will continue towards the three main objectives in 2022:
- Releasing OCaml 5.0 with support for domains and effect handlers
- Supporting the ecosystem to migrate the OCaml community over to OCaml 5.0
- Improving developer productivity for OCaml 5.0 by releasing the best platform tools
OCaml 5.0
The next major release of OCaml, version 5.0, will feature primitive support for parallel and concurrent programming through domains and effect handlers. The goal is to ensure that the fine balance that OCaml has struck between ease of use, correctness and performance over the past 25 years continues into the future with these additional features.
Domains enable shared-memory parallel programming allowing OCaml programs to run on multiple cores: with domains, OCaml programs will scale better by exploiting multicore processing. Effect handlers are a mechanism for concurrent programming: with the introduction of effect handlers, simple direct-style OCaml code will be flexible, easy to develop, debug and maintain (no more monads for concurrency!). These features will benefit the entire ecosystem and community, and we expect it to attract many new users to the language.
As part of the Multicore OCaml project, the team developed Sandmark, a suite of sequential and parallel benchmarks together with the infrastructure necessary to carefully run the programs and analyse the results. Sandmark has been instrumental in assessing and tuning the scalability of parallel OCaml programs and ensuring that OCaml 5.0 does not introduce performance regressions for existing sequential programs compared to OCaml 4.
Scalability of compute intensive OCaml programs
Sandmark is now run as a nightly service monitoring the performance of OCaml 5 as it is being developed. Development will continue to make it even easier to use and more practical by fully integrating it with current-bench (the continuous benchmarking system based on OCurrent). Get in touch if you want to know more.
Ecosystem
At Tarides we want all OCaml users to benefit from the new features that OCaml 5.0 will bring, and this means ensuring that the ecosystem is fully prepared. We aim to develop and maintain a robust set of libraries that work with domains and effects, together with a diverse parallel benchmarking and performance profiling suite to use with OCaml 5 applications. The first version of Eio, the effects-based direct-style IO stack for OCaml 5.0, has been released, generating lots of interesting discussion within the community. Eio not only makes it easier to develop, debug and maintain applications utilising asynchronous IO, but is also able to take advantage of multiple cores when available.
HTTP server performance using 24 cores
HTTP server scaling maintaining a constant load of 1.5 million requests per second
The early results are quite promising. An HTTP server based on Eio is able to
serve 1M+ requests/sec on 24 cores, outperforming Go's nethttp
and closely
matching Rust's hyper
performance. Eio is still heavily under development.
Expect even better numbers for its stable release planned later this year.
The next step is to iterate on the design in collaboration with the community and our partners. Get in touch if you have performance-sensitive applications that you'd like to port to Eio, so we can discuss how the design can meet your needs.
OCaml Platform
In collaboration with community members and commercial funders, Tarides has been developing and defining the OCaml platform tool suite for the last four years. The goal of the platform is to provide OCaml developers with easy access to high-quality, practical development tools to build any and every project. We will continue to develop and maintain these tools, and make them available for OCaml 5. Reach out to us if you have specific feature requests to make your developer teams more efficient.
This alliance brings the headcount of Tarides up to 60+ people, all working towards making OCaml the best language for any and every project. Join us!
Hide



OCaml from the Very Beginning now free in HTML and PDF formats — OCaml Book, Feb 18, 2022
Thanks to a grant from the OCaml Software Foundation (ocaml-sf.org), I am able to release my 2013 book OCaml from the Very Beginning free-of-charge in the existing PDF format, and in a new HTML format:
https://johnwhitington.net/ocamlfromtheverybeginning/
The book continues to be available in paperback and Kindle formats from Amazon.
The book has been updated for OCaml 4.14 in all formats.




Permanent Computer Scientist Position at CEA LIST - LSL — Frama-C, Feb 01, 2022




OCaml Labs Joins Tarides — Tarides, Jan 27, 2022
Today I am incredibly delighted to announce that OCaml Labs, a spinout from the University of Cambridge, is joining Tarides. After successfully collaborating on many OCaml projects over the last four years, this alliance will combine the expertise of both groups and enable us to bring OCaml, one of the most advanced programming languages in the world, into mainstream use. Combining forces will accelerate OCaml development and its broader adoption. Furthermore, it will bring the security, portabi…
Read more...Today I am incredibly delighted to announce that OCaml Labs, a spinout from the University of Cambridge, is joining Tarides. After successfully collaborating on many OCaml projects over the last four years, this alliance will combine the expertise of both groups and enable us to bring OCaml, one of the most advanced programming languages in the world, into mainstream use. Combining forces will accelerate OCaml development and its broader adoption. Furthermore, it will bring the security, portability, and performance of OCaml to a large spectrum of use-cases: from academic endeavours, such as formal methods and existing threats within cyber-security, to real-world applications for climate change, sustainable agriculture, and even space exploration!
All of OCaml Labs’ existing responsibilities and open-source commitments will migrate over to Tarides, and thanks to how closely the teams already work, business will continue without interruption to continuity or delivery. Gemma Gordon will step up as CEO of Tarides, and I will continue to lead the technological vision and strategy as CTO. As Prof. Anil Madhavapeddy, founder of OCaml Labs and scientific advisor of Tarides, points out, “The cutting edge research we conducted at the University over the past decade has now migrated into mainline OCaml, and so the ongoing curation and development will now happen on a commercially supported basis. I’m excited to continue collaborating on research with Tarides from the University of Cambridge.” Tarides will continue the work started at OCaml Labs and invest in the growth, health, and development of OCaml alongside its wider use-cases.
I am honoured to have the incredible OCaml Labs team - the team who carefully designed and crafted Multicore OCaml - join Tarides. We share a similar view that a common plague affects the ever-growing software industry, namely the bad quality of software and the omnipresence of bugs. However, this is not a fatal flaw. Tools developed by OCaml Labs over the years do not compromise on quality, and they allow dev teams to automatically fix at least 70% of security bugs and 0-days security exploits. Consequently, OCaml is a simple yet powerful language that can respond to the many challenges developers face today. Since Tarides’s inception, we have envisioned a future where all OCaml applications are easily deployable as specialised, secure, and energy-efficient MirageOS unikernels. This alliance is a step further in that direction. Since OCaml is the language used to develop MirageOS, Tarides has continuously developed and maintained parts of the OCaml ecosystem since its creation. Our alliance with OCaml Labs makes this more evident. The MirageOS ecosystem critically depends on OCaml, and the OCaml ecosystem benefits from innovations coming from the MirageOS project. Tarides is therefore fully committed to making the synergy between OCaml and MirageOS a success.
Several exciting projects related to Multicore OCaml are coming to a head this year. The OCaml 5.0 release will support multicore and effects handlers, influencing every aspect of the language and its ecosystem. The update will significantly improve both performance and user experience whilst maintaining existing features that make OCaml the language of choice for building, for instance, verification software tools. Using the teams’ combined experience and zest for innovation, Tarides is looking to the future of the OCaml language and community with excitement. We will continue to push the boundaries of exploration whilst focusing on what's good for the community. Therefore, this alliance will complement the commercial offering of Tarides and contribute to Tarides' mission: empowering developers, communities and organisations to adopt OCaml as their primary programming experience by providing training, expertise, and development services around the OCaml language.
“We are thrilled to be part of an organisation innovating in many areas around operating systems, distributed systems, and security with the Irmin distributed store and the MirageOS unikernel projects,” says Gemma Gordon, CEO of Tarides. “I am incredibly proud of the people OCaml Labs has collaborated with. We have been able to build a sustainable open-source community, with people from various backgrounds all collaborating together. It used to be that people would have to volunteer their time on OCaml, or work in academic research. We have created an additional funded path, one that has increased the diversity and innovation of our community. I’m excited to continue to be part of a group that brings the best minds together to solve the many problems the software industry faces today. I look forward to building a flourishing and sustainable commercial business with existing Tarides partners as well as developing new collaborative opportunities.”
This alliance brings the headcount of Tarides up to 60+ people, all working towards making OCaml the best language for any and every project. Join our team: https://tarides.com/company
OCaml Labs
OCaml Labs has been at the forefront of innovation in OCaml for nearly a decade. It was founded at the University of Cambridge by Prof. Anil Madhavapeddy in 2012 and developed into a spin-out consultancy company in 2016. OCaml Labs' mission was to push OCaml and functional programming forward as a platform, making it a more effective tool for all users (including large-scale industrial deployments) while at the same time growing the appeal of the language to broaden its applicability and popularity.
OCaml Labs has been instrumental in developing and maintaining the OCaml platform for OCaml usage at an industrial scale. OCaml Labs contributed to the development and maintenance of the opam package management ecosystem and of the OCaml community website, https://ocaml.org/, first launched in 2012. These sites act as hubs for the OCaml community to showcase the state-of-the-art and facilitate innovation. A new and improved version of the site has been released under beta this month. In addition, OCaml Labs' most significant (and technically complex) project, OCaml Multicore, will finally come to fruition this year. Work on this project began in 2014, followed by award-winning papers and presentations in 2020 and the announcement in late 2021 that Multicore will become part of the mainline OCaml compiler.
Tarides
Tarides is a tech start-up founded in Paris in 2018 by pioneers of programming languages and cloud computing. They develop a software infrastructure platform to deploy secure, distributed applications with strict resource constraints and low-latency performance requirements. This platform builds upon innovative and open-source projects such as MirageOS and Irmin and underpins mission-critical deployments such as Tezos, Citrix XenServer, or Docker for Desktop. In addition, Tarides uses unikernel technologies and applies the research done in programming languages to real-world systems to build safe and performant applications.
Tarides has been part of the Founder program of Station F in 2018. In addition, it got selected in France for the Concours d’Innovation i-Lab, organised by the French Ministry of Higher Education, Research and Innovation in partnership with Bpifrance. This national contest awards company creation and innovative technologies. Tarides got awarded during the FIC 2020, the leading European cybersecurity event.
Hide



How Jane Street Pairs Interns to Projects and Teams During the Software Engineering Internship — Jane Street, Jan 14, 2022
Software engineering intern candidates often ask how team placement works and how much input incoming interns have over their teams and projects. We know team placement is an important factor for many students when deciding which internship to accept. We’ve spent considerable time and thought on this process in recent years and hope to demystify the experience with this post. 1
-
This process is used in New York and London. Due to their smaller size Hong Kong’s process is slig…
Software engineering intern candidates often ask how team placement works and how much input incoming interns have over their teams and projects. We know team placement is an important factor for many students when deciding which internship to accept. We’ve spent considerable time and thought on this process in recent years and hope to demystify the experience with this post. 1
-
This process is used in New York and London. Due to their smaller size Hong Kong’s process is slightly different ↩




Is every projective setoid isomorphic to a type? — Andrej Bauer, Jan 11, 2022
Jacques Carette asked on Twitter for a refence to the fact that countable choice holds in setoids. I then spent a day formalizing facts about the axiom of choice in setoids in Agda. I noticed something interesting that is worth blogging about.
We are going to work in pure Martin-Löf type theory and the straightforward propostions-as-types interpretation of logic, so no univalence, propostional truncation and other goodies are available. Our primary objects of interest are setoids, and Agda's…
Read more...Jacques Carette asked on Twitter for a refence to the fact that countable choice holds in setoids. I then spent a day formalizing facts about the axiom of choice in setoids in Agda. I noticed something interesting that is worth blogging about.
We are going to work in pure Martin-Löf type theory and the straightforward propostions-as-types interpretation of logic, so no univalence, propostional truncation and other goodies are available. Our primary objects of interest are setoids, and Agda's setoids in particular. The content of the post has been formalized in this gist. I am not going to bother to reproduce here the careful tracking of universe levels that the formalization carries out (because it must).
In general, a type, set, or an object $X$ of some sort is said to satisfy choice when every total relation $R \subseteq X \times Y$ has a choice function: $$(\forall x \in X . \exists y \in Y . R(x,y)) \Rightarrow \exists f : X \to Y . \forall x \in X . R(x, f\,x). \tag{AC}$$ In Agda this is transliterated for a setoid $A$ as follows:
satisfies-choice : ∀ c' ℓ' r → Set (c ⊔ ℓ ⊔ suc c' ⊔ suc ℓ' ⊔ suc r)
satisfies-choice c' ℓ' r = ∀ (B : Setoid c' ℓ') (R : SetoidRelation r A B) →
(∀ x → Σ (Setoid.Carrier B) (rel R x)) → Σ (A ⟶ B) (λ f → ∀ x → rel R x (f ⟨$⟩ x))
Note the long arrow in A ⟶ B
which denotes setoid maps, i.e., the choice map $f$ must respect the setoid equivalence relations $\sim_A$ and $\sim_B$.
A category theorist would instead prefer to say that $A$ satisfies choice if every epi $e : B \to A$ splits: $$(\forall B . \forall e : B \to A . \text{$e$ epi} \Rightarrow \exists s : A \to B . e \circ s = \mathrm{id}_A. \tag{PR}.$$ Such objects are known as projective. The Agda code for this is
surjective : ∀ {c₁ ℓ₁ c₂ ℓ₂} {A : Setoid c₁ ℓ₁} {B : Setoid c₂ ℓ₂} → A ⟶ B → Set (c₁ ⊔ c₂ ⊔ ℓ₂)
surjective {B = B} f = ∀ y → Σ _ (λ x → Setoid._≈_ B (f ⟨$⟩ x) y)
split : ∀ {c₁ ℓ₁ c₂ ℓ₂} {A : Setoid c₁ ℓ₁} {B : Setoid c₂ ℓ₂} → A ⟶ B → Set (c₁ ⊔ ℓ₁ ⊔ c₂ ⊔ ℓ₂)
split {A = A} {B = B} f = Σ (B ⟶ A) (λ g → ∀ y → Setoid._≈_ B (f ⟨$⟩ (g ⟨$⟩ y)) y)
projective : ∀ c' ℓ' → Set (c ⊔ ℓ ⊔ suc c' ⊔ suc ℓ')
projective c' ℓ' = ∀ (B : Setoid c' ℓ') (f : B ⟶ A) → surjective f → split f
(If anyone can advise me how to to avoid the ugly Setoid._≈_ B
above using just what is available in the standard library, please do. I know how to introduce my own notation, but why should I?)
Actually, the above code uses surjectivity in place of being epimorphic, so we should verify that the two notions coincide in setoids, which is done in Epimorphism.agda
. The human proof goes as follows, where we write $=_A$ or just $=$ for the equivalence relation on a setoid $A$.
Theorem: A setoid morphism $f : A \to B$ is epi if, and only if, $\Pi (y : B) . \Sigma (x : A) . f \, x =_B y$.
Proof. (⇒) I wrote up the proof on MathOverflow. That one works for toposes, but is easy to transliterate to setoids, just replace the subobject classifier $\Omega$ with the setoid of propositions $(\mathrm{Type}, {\leftrightarrow})$.
(⇐) Suppose $\sigma : \Pi (y : B) . \Sigma (x : A) . f \, x =_B y$ and $g \circ f = h \circ f$ for some $g, h : B \to C$. Given any $y : B$ we have $$g(y) =_C g(f(\mathrm{fst}(\sigma\, y))) =_C h(f(\mathrm{fst}(\sigma\, y))) =_C h(y).$$ QED.
Every type $T$ may be construed as a setoid $\Delta T = (T, \mathrm{Id}_T)$, which is setoid
in Agda.
Say that a setoid $A$ has canonical elements when there is a map $c : A \to A$ such that $x =_A y$ implies $\mathrm{Id}_A(c\,x , c\,y)$, and $c\, x =_A x$ for all $x : A$. In other words, the map $c$ takes each element to a canonical representative of its equivalence class. In Agda:
record canonical-elements : Set (c ⊔ ℓ) where
field
canon : Carrier → Carrier
canon-≈ : ∀ x → canon x ≈ x
canon-≡ : ∀ x y → x ≈ y → canon x ≡ canon y
Based on my experience with realizability models, I always thought that the following were equivalent:
- $A$ satisfies choice (AC)
- $A$ is projective (PR)
- $A$ is isomorphic to a some $\Delta T$
- $A$ has canonical elements.
But there is a snag! The implication (2 ⇒ 3) seemingly requires extra conditions that I do not know how to get rid of. Before discussing these, let me just point out that SetoidChoice.agda
formalizes (1 ⇔ 2) and (3 ⇒ 4 ⇒ 1) unconditionally. In particular any $\Delta T$ is projective.
The implication (2 ⇒ 3) I could prove under the additional assumption that the underlying type of $A$ is an h-set. Let us take a closer look. Suppose $(A, {=_A})$ is a projective setoid. How could we get a type $T$ such that $A \cong \Delta T$? The following construction suggests itself. The setoid map
\begin{align}
r &: (A, \mathrm{Id}_A) \to (A, {=_A}) \notag \\\
r &: x \mapsto x \notag
\end{align}
is surjective, therefore epi. Because $A$ is projective, the map splits, so we have a setoid morphism $s : (A, {=_A}) \to (A, \mathrm{Id}_A)$ such that $r \circ s = \mathrm{id}$. The endomap $s \circ r : A \to A$ is a choice of canonical representatives of equivalence classes of $(A, {=_A})$, so we expect $(A, {=_A})$ to be isomorphic to $\Delta T$ where $$T = \Sigma (x : A) . \mathrm{Id}_A(s (r \, x), x).$$ The mediating isomorphisms are
\begin{align}
i &: A \to T & j &: T \to A \notag \\\
i &: x \mapsto (s (r \, x), \zeta \, x) & j &: (x, \xi) \mapsto x \notag
\end{align}
where $\zeta \, x : \mathrm{Id}(s (r (s (r \, x))), s (r \, x)))$ is constructed from the proof that $s$ splits $r$. This almost works! It is easy to verify that $j (i \, x) =_A x$, but then I got stuck on showing that $\mathrm{Id}_T(i (j (x, \xi), (x, \xi))$, which amounts to inhabiting $$ \mathrm{Id}_T((x, \zeta x), (x, \xi)). \tag{1} $$ There is no a priori reason why $\zeta x$ and $\xi$ would be equal. If $A$ is an h-set then we are done because they will be equal by fiat. But what do to in general? I do not know and I leave you with an open problem:
Egbert Rijke and I spent one tea-time thinking about producing a counter-example by using circles and other HoTT gadgets, but we failed. Just a word of warning: in HoTT/UF the map $1 \to S^1$ from the unit type to the circle is onto (in the HoTT sense) but $\Delta 1 \to \Delta S^1$ is not epi in setoids, because that would split $1 \to S^1$.
Here is an obvious try: use the propositional truncation and define $$ T = \Sigma (x : A) . \|\mathrm{Id}_A(s (r \, x), x) \|. $$ Now (1) does not pose a problem anymore. However, in order for $\Delta T$ to be isomorphic to $(A, {=_A})$ we will need to know that $x =_A y$ is an h-proposition for all $x, y : A$.
This is as far as I wish to descend into the setoid hell.
Hide



Two new doctors! — Andrej Bauer, Jan 11, 2022
Within a month two of my students defended their theses: Dr. Anja Petković Komel just before Christmas, and Dr. Philipp Haselwarter just yesterday. I am very proud of them. Congratulations!
Philipp's thesis An Effective Metatheory for Type Theory has three parts:
-
A formulation and a study of the notion of finitary type theories and standard type theories. These are closely related to the general type theories that were developed with Peter Lumsdaine, but are tailored for implementatio…
Within a month two of my students defended their theses: Dr. Anja Petković Komel just before Christmas, and Dr. Philipp Haselwarter just yesterday. I am very proud of them. Congratulations!
Philipp's thesis An Effective Metatheory for Type Theory has three parts:
-
A formulation and a study of the notion of finitary type theories and standard type theories. These are closely related to the general type theories that were developed with Peter Lumsdaine, but are tailored for implementation.
-
A formulation and the study of context-free finitary type theories, which are type theories without explicit contexts. Instead, the variables are annotated with their types. Philipp shows that one can pass between the two versions of type theory.
-
A novel effectful meta-language Andromeda meta-language (AML) for proof assistants which uses algebraic effects and handlers to allow flexible interaction between a generic proof assistant and the user.
Anja's thesis Meta-analysis of type theories with an application to the design of formal proofs also has three parts:
-
A formulation and a study of transformations of finitary type theories with an associated category of finitary type theories.
-
A user-extensible equality checking algorithm for standard type theories which specializes to several existing equality checking algorithms for specific type theories.
-
A general elaboration theorem in which the transformation of type theories are used to prove that every finitary type theory (not necessarily fully annotated) can be elaborated to a standard type theory (fully annotated one).
In addition, Philipp has done a great amount of work on implementing context-free type theories and the effective meta-language in Andromeda 2, and Anja implemented the generic equality checking algorithm. In the final push to get the theses out the implementation suffered a little bit and is lagging behind. I hope we can bring it up to speed and make it usable. Anja has ideas on how to implement transformations of type theories in a proof assistant.
Of course, I am very happy with the particular results, but I am even happier with the fact that Philipp and Anja made an important step in the development of type theory as a branch of mathematics and computer science: they did not study a particular type theory or a narrow family of them, as has hitherto been the norm, but dependent type theories in general. Their theses contain interesting non-trivial meta-theorems that apply to large classes of type theories, and can no doubt be generalized even further. There is lots of low-hanging fruit out there.
Hide



Magic-trace: Diagnosing tricky performance issues easily with Intel Processor Trace — Jane Street, Jan 11, 2022
Intel Processor Trace is a hardware technology that can record all program execution flow along with timing information accurate to around 30ns. As far as I can tell almost nobody uses it, seemingly because capturing the data is tricky and, without any visualization tools, you’re forced to read enormous text dumps.




Monorobot: a Slack bot for monorepos — Ahrefs, Dec 09, 2021
Monorobot: a notification bot for monorepos

A few years ago, we decided to move most of our code into a monorepo. Many advocates have highlighted its upsides, which include better cross-project coordination and simpler dependency management.
But one problem remained: none of the available GitHub integrations for Slack work nicely with monorepos. Slack is vital for day to day communication among Ahrefs’ globally distrib…
Read more...Monorobot: a notification bot for monorepos

A few years ago, we decided to move most of our code into a monorepo. Many advocates have highlighted its upsides, which include better cross-project coordination and simpler dependency management.
But one problem remained: none of the available GitHub integrations for Slack work nicely with monorepos. Slack is vital for day to day communication among Ahrefs’ globally distributed team, so a Slack integration was a must-have feature. We needed a service that could map activity from our various subprojects to their corresponding Slack channels — something existing solutions didn’t offer.
That’s why we built our own integration: Monorobot, a notification bot for monorepos. We’ve improved it iteratively since the monorepo transition, incorporating real time feedback from our engineers over time. Today, Monorobot is an active member of Ahrefs’ Slack workspace, dutifully routing GitHub activity notifications to different channels based on the relevance of each activity.
And now we’re open-sourcing Monorobot, for anybody to use in their monorepo setup! The package is available via OPAM:
opam install monorobot
Read on for more details about the motivation, an overview of the main features, and what’s in the pipeline.
Existing Slack integrations lack monorepo support
Within a monorepo, multiple projects have their code located in separate, nested directories. Correspondingly, each project’s Slack channel is only interested in activity from that part of the overall repository. The issue with most GitHub-to-Slack integrations is that once you subscribe a Slack channel to a GitHub repository, the channel receives all activity from that repository.
Suppose we operate various camel-related services, and we’re planning to launch a new camel ride-sharing app called Camel Ride. The directory structure could look like this:
monorepo/
| frontend/
| | camelride_ui/
| | | mobile/
| | | web/
| | cameldance/
| backend/
| | camelride/
| | | routing/
| | | pricing/
| | camelfood/
As you can see, both the frontend/ and backend/ directories contain code for our fictitious ride-sharing service, along with code from other projects.
If we were to connect the GitHub for Slack integration to this repository, notifications for activity from all projects would be sent to the same channel. Even if I were only interested in activity from the Camel Ride project, I’d need to sift through notifications from the other, unrelated projects. Imagine the volume of notifications this would create for a larger monorepo with dozens of projects. What a mess!
Enter Monorobot

Monorobot enables more granular control over where notifications from the same repository get routed, depending on the type of activity. This routing behavior can be defined in a configuration file named .monorobot.json, which should be committed to the root of the monorepo. Once you create a GitHub webhook from the repository to a running instance of Monorobot, it will use the configuration file to route notifications to relevant channels based on the webhook event payload:
- For pushed commits, it checks the path prefixes of the files modified in the commits.
- For activity related to PRs and issues, it checks their labels.
- For status updates on pushed commits (e.g., CI builds), it uses the same path prefix logic as pushed commits.
Additionally, Monorobot supports unfurling GitHub links shared in Slack.
Path prefix routing for commit push notifications
Continuing with our example, suppose we want to route all commit activity related to the Camel Ride project to a Slack channel called #camelride. Our configuration file might look like this:
{
...,
"prefix_rules": {
"rules": [
{
"match": [
"frontend/camelride_ui/",
"backend/camelride/"
],
"ignore": [
"frontend/camelride_ui/images",
],
"channel": "camelride"
}
]
}
}
Each rule “matches” a file path to a channel. Whenever someone pushes a commit touching files with either of these prefixes, the #camelride channel will be notified. All other commits will be ignored.
If a file prefix appears in a rule’s optional ignore field, the rule won't be matched even if the prefix is is also in the match field. In the above snippet, the Camel Ride frontend team has decided to silence notifications for activity in the images/ subdirectory.
Now, let’s say the project’s price optimization team is growing, and they’ve decided to create their own separate Slack channel called #camelride-pricing. We can simply commit an update to the .monorobot.json file, and Monorobot will detect the configuration change:
{
...,
"prefix_rules": {
"rules": [
{
"match": [
"frontend/camelride_ui/",
"backend/camelride/"
],
"ignore": [
"frontend/camelride_ui/images",
],
"channel": "camelride"
},
{
"match": [
"backend/camelride/pricing/"
],
"channel": "camelride-pricing"
}
]
}
}
Since Monorobot will match the rule with the longest matched prefix, only commits related to the price optimization aspect of Camel Ride will notify #camelride-pricing, and all other general Camel Ride commits will notify #camelride.
There are additional configuration options for prefix rules (and for label rules discussed in the next section) that aren’t mentioned here. Visit the repository for the full details.
Label-based routing for PRs and issue notifications
For activity related to pull requests and issues (opening, closing, merging, commenting, and reviewing), Monorobot uses labels to determine routing. The format is largely the same as for path prefix routing:
{
...,
"label_rules": {
"default_channel": "notifications",
"rules": [
{
"match": [
"Camel Ride"
],
"channel": "camelride"
},
{
"match": [
"Price Optimization"
],
"channel": "camelride-pricing"
}
]
}
}
Here, all PRs and issues with the “Camel Ride” label will have activity sent to #camelride; those with the “Price Optimization” label to #camelride-pricing; and those with both labels to both channels.
The default_channel field provides an option to fall back on a channel if no rule is matched; this option is available for prefix rules as well.
Status notifications
Monorobot also supports build status notifications for CI pipelines. When it receives a status update for a pushed commit, it routes it to the relevant channel(s) by applying the prefix rules to the commit associated with the build. Further filtering based on status (e.g., ignoring canceled builds, and only notifying for a successful build when preceded by a failed one) is also possible.
Link unfurling
Finally, Monorobot can unfurl links to GitHub repositories shared on Slack (including private ones, if a personal access token is provided). This applies to commit, issue, and pull request URLs.
What’s next
Monorobot is actively used at Ahrefs today, but there are lots of promising future directions it could take. Here, we list a few.
Unifying GitHub and Slack identities
It would be useful to allow GitHub user IDs to be mapped to Slack ones. This would enable more personalized features for Monorobot, such as direct messaging a user when their review is requested or when a CI build fails on a feature branch they authored.
Consolidating notifications
Sometimes, a collection of multiple GitHub webhook events makes sense to be grouped and delivered as a single Slack notification. For example, pull request reviews generate discrete webhook events for each review comment, but it would make more sense to pool them together, so as not to spam a channel with many notifications.
Better status notifications
A CI build failure can have multiple potential causes:
- A bad commit
- A previous bad commit that has yet to be fixed
- An issue with the pipeline itself (this is out of scope for Monorobot)
It can be quite tricky to discern between the first two causes from the GitHub webhook event alone. Cause 1 is handled well by our current approach of using path prefix routing on the commit associated with the build. But with cause 2, that same approach doesn’t always send the build failure notification to the channel where it is actually relevant. In that case, the originator of the initial bad commit won’t be nagged about all subsequent failures, and Slack channels with no relevance to the cause of failure will get polluted with unnecessary notifications.
How can we best determine whether a status notification is “relevant” to a Slack channel? This is still an open question, but one possible direction is to track build state per build step rather than per status, and route notifications based on that. For example, if an overall build fails due to a backend build step failure, then it could be sent to a channel where the frontend team won’t be notified.
Wrapping up
The overall goal of Monorobot is to make Slack notifications more relevant for all teams in a large monorepo environment, using the information available from GitHub webhook events. We’ve had fairly positive results with our own internal usage, and now we hope others find it useful as well.
Monorobot is written in OCaml. We welcome your feedback and contributions on GitHub!
P.S. If anyone does make an actual ride sharing service for camels, do let us know…
Thanks to Feihong, Igor, and Louis for feedback on this post.
Monorobot: a Slack bot for monorepos was originally published in ahrefs on Medium, where people are continuing the conversation by highlighting and responding to this story.
Hide



opam 2.1.2 release — OCaml Platform (Kate Deplaix - OCaml Labs, David Allsopp - OCaml Labs, Raja Boujbel - OCamlPro, Louis Gesbert - OCamlPro), Dec 08, 2021
We are pleased to announce the minor release of opam 2.1.2.
This opam release consists of backported fixes, including:
- Fallback on
dnf
ifyum
does not exist on RHEL-based systems (#4825) - Use
--no-depexts
in CLI 2.0 mode. This further improves the use of opam 2.1 as a drop-in replacement for opam 2.0 in CI, for example with setup-ocaml in GitHub Actions. (#4908)
Opam installation instructions (unchanged):
From binaries: run
bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/oc…
We are pleased to announce the minor release of opam 2.1.2.
This opam release consists of backported fixes, including:
- Fallback on
dnf
ifyum
does not exist on RHEL-based systems (#4825) - Use
--no-depexts
in CLI 2.0 mode. This further improves the use of opam 2.1 as a drop-in replacement for opam 2.0 in CI, for example with setup-ocaml in GitHub Actions. (#4908)
Opam installation instructions (unchanged):
From binaries: run
bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh) --version 2.1.2"
or download manually from the Github "Releases" page to your PATH. In this case, don't forget to run
opam init --reinit -ni
to enable sandboxing if you had version 2.0.0~rc manually installed or to update you sandbox script.From source, using opam:
opam update; opam install opam-devel
(then copy the opam binary to your PATH as explained, and don't forget to run
opam init --reinit -ni
to enable sandboxing if you had version 2.0.0~rc manually installed or to update your sandbox script)From source, manually: see the instructions in the README.
We hope you enjoy this new minor version, and remain open to bug reports and suggestions.
Hide



Release of Frama-C 24.0 (Chromium) — Frama-C, Nov 30, 2021




'Signals and Threads' Podcast: What is an Operating System? — Tarides, Nov 23, 2021
November has become MirageOS month! Between the upcoming official MirageOS 4.0 release, making custom Christmas Tree garlands with MirageOS on a Raspberry Pi, and now this "What is an Operating System?" podcast (featuring Tarides advisor and core MirageOS maintainer Anil Madhavapeddy), it truly is MirageOS month!
MirageOS can do much more than program a Raspberry Pi for Christmas decor. From agricultural monitoring to smart buildings, its applications cover a wide range of needs. For example, it…
Read more...November has become MirageOS month! Between the upcoming official MirageOS 4.0 release, making custom Christmas Tree garlands with MirageOS on a Raspberry Pi, and now this "What is an Operating System?" podcast (featuring Tarides advisor and core MirageOS maintainer Anil Madhavapeddy), it truly is MirageOS month!
MirageOS can do much more than program a Raspberry Pi for Christmas decor. From agricultural monitoring to smart buildings, its applications cover a wide range of needs. For example, it can also be used in critical pieces of infrastructure where security is of paramount importance, such as the Tezos blockchain. Combined with Irmin, it allows developers to build secure-by-design, offline-first systems and invert the current cloud-centric model for designing applications to securely connect physical spaces with extremely low latency and high bandwidth, using local-area computation capabilities.
You can read the entire transcript here and find links to multiple places to listen through podcast apps.
Hide



The proposal for a proof assistants StackExchange site — Andrej Bauer, Nov 19, 2021
Proof assistant communities have grown quite a bit lately. They have active Zulip chats: Lean, Coq, Agda, Isabelle. These are good for discussions, but less so for knowledge accumulation and organization, and are not indexed by the search engines.
I have therefore created a proposal for a new “Proof assistants” StackExchange site. I believe that such a site would complement very well various Zulips dedicated to specific proof assistants. If you favor the idea, please support it by visiting …
Read more...Proof assistant communities have grown quite a bit lately. They have active Zulip chats: Lean, Coq, Agda, Isabelle. These are good for discussions, but less so for knowledge accumulation and organization, and are not indexed by the search engines.
I have therefore created a proposal for a new “Proof assistants” StackExchange site. I believe that such a site would complement very well various Zulips dedicated to specific proof assistants. If you favor the idea, please support it by visiting the proposal and
- becoming a follower (you have to be a registered user with a verified email account),
- asking sample questions, and
- upvoting good sample questions.
To pass the first stage, we need 60 followers and 40 questions with at least 10 votes to proceed to the next stage.
Hide



Tarides & Hyper: Partners in Agricultural Innovation — Tarides, Nov 18, 2021
We are thrilled to announce a partnership between Tarides and Hyper, a technology provider in the agritech space who’s building an "operating system for high-performing farms." Indoor and vertical farms are becoming tech businesses that require scalable, flexible, and easy-to-use tools to facilitate data analysis and thereby increase productivity. According to the State of Indoor Farming 2020 Report, “40% of vertical and indoor farms are implementing data analytics and control automation to …
Read more...We are thrilled to announce a partnership between Tarides and Hyper, a technology provider in the agritech space who’s building an "operating system for high-performing farms." Indoor and vertical farms are becoming tech businesses that require scalable, flexible, and easy-to-use tools to facilitate data analysis and thereby increase productivity. According to the State of Indoor Farming 2020 Report, “40% of vertical and indoor farms are implementing data analytics and control automation to increase yield and lower cost of production.”
Hyper’s product offers a developer-friendly platform for modern farms to integrate analytics and automation without worrying about hardware, scaling, or maintenance. There is a natural synergy between Hyper's product and Tarides’s mission to bring robust and scalable functional systems to the industry. Both teams will be working on technology to help solve some big problems the world faces.
First, some context about how agriculture affects our environment:
The world’s population is growing, but the size of our planet is not.
Agriculture is currently responsible for:
- 75% of the world’s deforestation1
- 50% of the world’s habitable land
- 70% of the world's freshwater withdrawals
- 26% of the greenhouse gas emissions2
It’s an important time for indoor farming, as it’s already producing twenty times (20x) more yield per area while using 95% less water and zero (0) pesticides. Plus, indoor farming will reduce the carbon footprint by cutting their food miles in half (50%)3. Vertical farming complements traditional farming by growing fresh produce for cities in a sustainable way.
Hyper's mission is to simplify the integration of sensors and controllers for data collection and automation. Their roadmap is focused on implementing real-time computation of metrics for environmental data gathered from farms, prototyping computer vision models for crop analysis, and automated crop traceability infrastructure. With their data platform, growers can optimise yield and reduce operating costs by getting access to crop growth metrics and climate automation profiles without a dedicated engineering team.
Hyper's founders are experienced engineers and OCaml hackers who have worked on IoT and data analytics products in the past in the retail, biotech, and multimedia industries. Utilizing our MirageOS ecosystem, Hyper's sensor networks and cameras are continuously collecting millions of data points across large-scale farming operations to ensure consistent crop quality, detect issues early, and automate climate control. Since the technology is offline-first, it enables Hyper to collect this data securely, without the risk of breach or loss, as is sometimes the case in cloud computing.
As a technological partner, Tarides will help Hyper build a scalable IoT platform that leverages the OCaml ecosystem. Our support will help Hyper bring the product to the market faster while also contributing to the open-source IoT ecosystem for MirageOS and related projects. Hyper’s IoT platform will be fully open-source next year and will focus on implementing better support for MQTT, CoAP, and other protocols.
A new version of MirageOS will be released in November, so this exciting announcement couldn't come at a better time! Hyper’s use of the MirageOS ecosystem to build its data intelligence product is yet another real-world example that displays the power and efficiency of MirageOS.
It’s truly exciting technology because we’re at a point in history where even the most adamant climate change critics have begun to admit the climate is indeed changing. There are things we can each do to contribute, so Tarides is proud to collaborate with a company that’s actively seeking solutions to help reduce the amount of greenhouse gas emissions while increasing productivity.
Hyper currently has production deployments in vertical and indoor farms in the UK and East Africa and is planning on scaling the operations in the coming months.
SOURCES




opam releases: 2.0.10, 2.1.1, & opam depext 1.2! — OCaml Platform (David Allsopp - OCamlLabs, Raja Boujbel - OCamlPro, Louis Gesbert - OCamlPro), Nov 15, 2021
Feedback on this post is welcomed on Discuss!
We are pleased to announce several minor releases: opam 2.0.10, opam 2.1.1, and opam-depext 1.2.
The opam releases consist of backported fixes, while opam-depext
has been adapted to be compatible with opam 2.1, to allow for workflows which need to maintain compatibility with opam 2.0. With opam 2.1.1, if you export OPAMCLI=2.0
into your environment then workflows expecting opam 2.0 should now behave even more equivalently.
opam-depext 1.2
Previous…
Read more...Feedback on this post is welcomed on Discuss!
We are pleased to announce several minor releases: opam 2.0.10, opam 2.1.1, and opam-depext 1.2.
The opam releases consist of backported fixes, while opam-depext
has been adapted to be compatible with opam 2.1, to allow for workflows which need to maintain compatibility with opam 2.0. With opam 2.1.1, if you export OPAMCLI=2.0
into your environment then workflows expecting opam 2.0 should now behave even more equivalently.
opam-depext 1.2
Previous versions of opam-depext were made unavailable when using opam 2.1, since depext handling is done directly by opam 2.1 and later. This is still the recommended way, but this left workflows which wanted to maintain compatibility with opam 2.0 without a single command to install depexts. You can now run OPAMCLI=2.0 opam depext -y package1 package2
and expect this to work correctly with any version of opam 2. If you don't specify OPAMCLI=2.0
then the plugin will remind you that you should be using the integrated depext support! Calling opam depext
this way with opam 2.1 and later still exposes the same double-solving problem that opam 2.0 has, but if for some reason the solver returns a different solution at opam install
then the correct depexts would be installed.
For opam 2.0, some useful depext handling changes are back-ported from opam 2.1.x to the plugin: With opam 2.0:
- yum-based distributions: force not to upgrade (#137)
- Archlinux: always upgrade all the installed packages when installing a new package (#138)
opam 2.1.1
opam 2.1.1 includes both the fixes in opam 2.0.10.
General fixes:
- Restore support for switch creation with "positional" package arguments and
--packages
option for CLI version 2.0, e.g.OPAMCLI=2.0 opam switch create . 4.12.0+options --packages=ocaml-option-flambda
. In opam 2.1 and later, this syntax remains an error (#4843) - Fix
opam switch set-invariant
: default repositories were loaded instead of the switch's repositories selection (#4869) - Run the sandbox check in a temporary directory (#4783)
Integrated depext support has a few updates:
- Homebrew now has support for casks and full-names (#4800)
- Archlinux now handles virtual package detection (#4833, partially addressing #4759)
- Disable the detection of available packages on RHEL-based distributions. This fixes an issue on RHEL-based distributions where yum list used to detect available and installed packages would wait for user input without showing any output and/or fail in some cases (#4791)
And finally two regressions have been dealt with:
- Regression: avoid calling
Unix.environment
on load (as a toplevel expression). This regression affected opam's libraries, rather than the binary itself (#4789) - Regression: handle empty environment variable updates (#4840)
A few issues with the compilation of opam from sources have been fixed as well (e.g. mingw-w64 with g++ 11.2 now works)
opam 2.0.10
Two subtle fixes are included in opam 2.0.10. These actually affect the ocaml
package. Both of these are Heisenbugs - investigating what's going wrong on your system may well have fixed them, they were both found on Windows!
$(opam env --revert)
is the reverse of the more familiar $(opam env)
but it's effectively called by opam whenever you change switch. It has been wrong since 2.0.0 for the case where several values are added to an environment variable in one setenv
update. For example, if a package included a setenv
field of the form [PATH += "dir1:dir2"]
, then this would not be reverted, but [[PATH += "dir1"] [PATH += "dir2"]]
would be reverted. As it happens, this bug affects the ocaml
package, but it was masked by another setenv
update in the same package.
The other fix is also to do with setenv
. It can be seen immediately after creating a switch but before any additional packages are installed, as this Dockerfile
shows:
FROM ocaml/opam@sha256:244b948376767fe91e2cd5caca3b422b2f8d332f105ef2c8e14fcc9a20b66e25
RUN sudo apt-get install -y ocaml-nox
RUN opam --version
RUN opam switch create show-issue ocaml-system
RUN eval $(opam env) ; echo $CAML_LD_LIBRARY_PATH
RUN opam install conf-which
RUN eval $(opam env) ; echo $CAML_LD_LIBRARY_PATH
Immediately after switch creation, $CAML_LD_LIBRARY_PATH
was set to /home/opam/.opam/show-issue/lib/stublibs:
, rather than /home/opam/.opam/show-issue/lib/stublibs:/usr/local/lib/ocaml/4.08.1/stublibs:/usr/lib/ocaml/stublibs
Opam installation instructions (unchanged):
From binaries: run
bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh) --version 2.1.1"
or download manually from the Github "Releases" page to your PATH. In this case, don't forget to run
opam init --reinit -ni
to enable sandboxing if you had version 2.0.0~rc manually installed or to update you sandbox script.From source, using opam:
opam update; opam install opam-devel
(then copy the opam binary to your PATH as explained, and don't forget to run
opam init --reinit -ni
to enable sandboxing if you had version 2.0.0~rc manually installed or to update your sandbox script)From source, manually: see the instructions in the README.
We hope you enjoy this new minor version, and remain open to bug reports and suggestions.
Hide



MirageOS Workshop: Working with the Raspberry Pi 4 — Tarides, Nov 11, 2021
Earlier this week, Romain Calascibetta hosted an in-house MirageOS workshop for employees, both locally and remotely around the world. This interactive workshop taught participants how to build an operating system on a Raspberry Pi 4 using MirageOS. They got to create their own OS and play with projects, like one they dubbed GuirlandeOS for which they programmed an LED garland to trim their Christmas tree, creating their own customized light show! There will be a dedicated blog to GuirlandeOS so…
Read more...Earlier this week, Romain Calascibetta hosted an in-house MirageOS workshop for employees, both locally and remotely around the world. This interactive workshop taught participants how to build an operating system on a Raspberry Pi 4 using MirageOS. They got to create their own OS and play with projects, like one they dubbed GuirlandeOS for which they programmed an LED garland to trim their Christmas tree, creating their own customized light show! There will be a dedicated blog to GuirlandeOS soon.
That’s one fun example of what can be done with MirageOS and Raspberry Pi, but the possibilities are endless. For example, one could use this dynamic pair to create solar-powered websites (something we’ll hear more about next week).
The spirit of MirageOS is that anyone can integrate it, even if they don't work at Tarides. Although the workshop was only for employees, MirageOS is for everyone! Since it's autonomous from Tarides, we encourage you to play with MirageOS and see what you can create.
Romain has opened the contents of his informative workshop to the world! Follow along with his slides, which will walk you through the MirageOS toolchain to create your very own projects. You can read more about the Raspberry Pi process in this repo.
Early next week, we’ll continue with MirageOS month by showcasing a podcast where Anil Madhavapeddy talks about those solar-powered websites and more!
Happy hacking!
Hide



MirageOS 4.0 Preview Live Presentation — Tarides, Nov 09, 2021
The official release of MirageOS 4.0 quickly approaches! Learn about some general MirageOS concepts and get a sneak park at the forthcoming changes in MirageOS 4.0 during a LIVE presentation today at 15h CET.
Lucas Pluvinage will lead you through a live-streaming presentation to acquaint you with MirageOS 4.0. You’ll learn what kinds of problems MirageOS can solve and about Functoria, the compilation model. Then Lucas will also discuss the switch to the Dune build system and how that enables…
Read more...The official release of MirageOS 4.0 quickly approaches! Learn about some general MirageOS concepts and get a sneak park at the forthcoming changes in MirageOS 4.0 during a LIVE presentation today at 15h CET.
Lucas Pluvinage will lead you through a live-streaming presentation to acquaint you with MirageOS 4.0. You’ll learn what kinds of problems MirageOS can solve and about Functoria, the compilation model. Then Lucas will also discuss the switch to the Dune build system and how that enables cross-compilation, not to mention the creation of new compilation targets, such as the Raspberry Pi!
Watch the live presentation today at 15h CET. Just go to: https://meet.google.com/iqy-urht-rcn, or dial (FR) +33 1 87 40 43 45 with the PIN: 288 878 885#. You can also find other phone numbers here: https://tel.meet/iqy-urht-rcn?pin=3736259978366 if you're not in France.
This presentation will be a great introduction to Romain Calascibetta's forthcoming MirageOS workshop tomorrow. Check back on this blog and follow us on Twitter to find out more about how to watch and participate in this informative workshop tomorrow!
Until then, check out Lucas's presentation today at 15h CET.
Hide



Beta release of Frama-C 24.0-beta (Chromium) — Frama-C, Nov 04, 2021




Isolating Xwayland in a VM — Thomas Leonard, Oct 30, 2021
In my last post, Qubes-lite with KVM and Wayland, I described setting up a Qubes-inspired Linux system that runs applications in virtual machines. A Wayland proxy running in each VM connects its applications to the host Wayland compositor over virtwl, allowing them to appear on the desktop alongside normal host applications. In this post, I extend this to support X11 applications using Xwayland.
Table of Contents
Read more...In my last post, Qubes-lite with KVM and Wayland, I described setting up a Qubes-inspired Linux system that runs applications in virtual machines. A Wayland proxy running in each VM connects its applications to the host Wayland compositor over virtwl, allowing them to appear on the desktop alongside normal host applications. In this post, I extend this to support X11 applications using Xwayland.
Table of Contents
- Overview
- Introduction to X11
- Running Xwayland
- The X11 protocol
- Initialising the window manager
- Windows
- Performance
- Pointer events
- Keyboard events
- Pointer cursor
- Selections
- Drag-and-drop
- Bonus features
- Conclusions
( this post also appeared on Hacker News )
Overview
A graphical desktop typically allows running multiple applications on a single display (e.g. by showing each application in a separate window). Client applications connect to a server process (usually on the same machine) and ask it to display their windows.
Until recently, this service was an X server, and applications would communicate with it using the X11 protocol. However, on newer systems the display is managed by a Wayland compositor, using the Wayland protocol.
Many older applications haven't been updated yet. Xwayland can be used to allow unmodified X11 applications to run in a Wayland desktop environment. However, setting this up wasn't as easy as I'd hoped. Ideally, Xwayland would completely isolate the Wayland compositor from needing to know anything about X11:
Fantasy Xwayland architecture
However, it doesn't work like this. Xwayland handles X11 drawing operations, but it doesn't handle lots of other details, including window management (e.g. telling the Wayland compositor what the window title should be), copy-and-paste, and selections. Instead, the Wayland compositor is supposed to connect back to Xwayland over the X11 protocol and act as an X11 window manager to provide the missing features:
Actual Xwayland architecture
This is a problem for several reasons:
- It means that every Wayland compositor has to implement not only the new Wayland protocol, but also the old X11 protocol.
- The compositor is part of the trusted computing base (it sees all your keystrokes and window contents) and this adds a whole load of legacy code that you'd need to audit to have confidence in it.
- It doesn't work when running applications in VMs, because each VM needs its own Xwayland service and existing compositors can only manage one.
Because Wayland (unlike X11) doesn't allow applications to mess with other applications' windows, we can't have a third-party application act as the X11 window manager. It wouldn't have any way to ask the compositor to put Xwayland's surfaces into a window frame, because Xwayland is a separate application.
There is another way to do it, however. As I mentioned in the last post, I already had to write a Wayland proxy (wayland-proxy-virtwl) to run in each VM and relay Wayland messages over virtwl, so I decided to extend it to handle Xwayland too. As a bonus, the proxy can also be used even without VMs, avoiding the need for any X11 support in Wayland compositors at all. In fact, I found that doing this avoided several bugs in Sway's built-in Xwayland support.
Sommelier already has support for this, but it doesn't work for the applications I want to use.
For example, popup menus appear in the center of the screen, text selections don't work, and it generally crashes after a few seconds (often with the error xdg_surface has never been configured
).
So instead I'd been using ssh -Y vm
from the host to forward X11 connections to the host's Xwayland,
managed by Sway.
That works, but it's not at all secure.
Introduction to X11
Unlike Wayland, where applications are mostly unaware of each other, X is much more collaborative.
The X server maintains a tree of windows (rectangles) and the applications manipulate it.
The root of the tree is called the root window and fills the screen.
You can see the tree using the xwininfo
command, like this:
$ xwininfo -tree -root
xwininfo: Window id: 0x47 (the root window) (has no name)
Root window id: 0x47 (the root window) (has no name)
Parent window id: 0x0 (none)
9 children:
0x800112 "~/Projects/wayland/wayland-proxy-virtwl": ("ROX-Filer" "ROX-Filer") 2184x2076+0+0 +0+0
1 child:
0x800113 (has no name): () 1x1+-1+-1 +-1+-1
0x800123 (has no name): () 1x1+-1+-1 +-1+-1
0x800003 "ROX-Filer": () 10x10+-100+-100 +-100+-100
0x800001 "ROX-Filer": ("ROX-Filer" "ROX-Filer") 10x10+10+10 +10+10
1 child:
0x800002 (has no name): () 1x1+-1+-1 +9+9
0x600002 "main.ml (~/Projects/wayland/wayland-proxy-virtwl) - GVIM1": ("gvim" "Gvim") 1648x1012+0+0 +0+0
1 child:
0x600003 (has no name): () 1x1+-1+-1 +-1+-1
0x600007 (has no name): () 1x1+-1+-1 +-1+-1
0x600001 "Vim": ("gvim" "Gvim") 10x10+10+10 +10+10
0x200002 (has no name): () 1x1+0+0 +0+0
0x200001 (has no name): () 1x1+0+0 +0+0
This tree shows the windows of two X11 applications, ROX-Filer and GVim, as well as various invisible utility windows (mostly 1x1 or 10x10 pixels in size).
Applications can create, move, resize and destroy windows, draw into them, and request events from them.
The X server also allows arbitrary data to be attached to windows in properties.
You can see a window's properties with xprop
. Here are some of the properties on the GVim window:
$ xprop -id 0x600002
WM_HINTS(WM_HINTS):
Client accepts input or input focus: True
Initial state is Normal State.
window id # of group leader: 0x600001
_NET_WM_WINDOW_TYPE(ATOM) = _NET_WM_WINDOW_TYPE_NORMAL
WM_NORMAL_HINTS(WM_SIZE_HINTS):
program specified minimum size: 188 by 59
program specified base size: 188 by 59
window gravity: NorthWest
WM_CLASS(STRING) = "gvim", "Gvim"
WM_NAME(STRING) = "main.ml (~/Projects/wayland/wayland-proxy-virtwl) - GVIM1"
...
The X server itself doesn't know anything about e.g. window title bars. Instead, a window manager process connects and handles that. A window manager is just another X11 application. It asks to be notified when an application tries to show ("map") a window inside the root, and when that happens it typically creates a slightly larger window (with room for the title bar, etc) and moves the other application's window inside that.
This design gives X a lot of flexibility. All kinds of window managers have been implemented, without needing to change the X server itself. However, it is very bad for security. For example:
- Open an xterm.
- Use
xwininfo
to find its window ID (you need the nested child window, not the top-level one). - Run
xev -id 0x80001b -event keyboard
in another window (using the ID you got above). - Use
sudo
or similar insidexterm
and enter a password.
As you type the password into xterm
, you should see the characters being captured by xev
.
An X application can easily spy on another application, send it synthetic events, etc.
Running Xwayland
Xwayland is a version of the xorg X server that treats Wayland as its display hardware.
If you run it as e.g. Xwayland :1
then it opens a single Wayland window corresponding to the X root window,
and you can use it as a nested desktop.
This isn't very useful, because these windows don't fit in with the rest of your desktop.
Instead, it is normally used in rootless mode, where each child of the X root window may have its own Wayland window.
$ WAYLAND_DEBUG=1 Xwayland :1 -rootless
[3991465.523] -> wl_display@1.get_registry(new id wl_registry@2)
[3991465.531] -> wl_display@1.sync(new id wl_callback@3)
...
When run this way, however, no windows actually appear.
If we run DISPLAY=:1 xterm
then we see Xwayland creating some buffers, but no surfaces:
[4076460.506] -> wl_shm@4.create_pool(new id wl_shm_pool@15, fd 9, 540)
[4076460.520] -> wl_shm_pool@15.create_buffer(new id wl_buffer@24, 0, 9, 15, 36, 0)
[4076460.526] -> wl_shm_pool@15.destroy()
...
We need to run Xwayland as Xwayland :1 -rootless -wm FD
, where FD is a socket we will use to speak the X11 protocol and act as a window manager.
It's a little hard to find information about Xwayland's rootless mode, because "rootless" has two separate common meanings in xorg:
- Running xorg without root privileges.
- Using xorg's miext/rootless extension to display application windows on some other desktop.
After a while, it became clear that Xwayland's rootless mode isn't either of these, but a third xorg feature also called "rootless".
The X11 protocol
libxcb provides C bindings to the X11 protocol, but I wanted to program in OCaml. Luckily, the X11 protocol is well documented, and generating the messages directly didn't look any harder than binding libxcb, so I wrote a little OCaml library to do this (ocaml-x11).
At first, I hard-coded the messages. For example, here's the code to delete a property on a window:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
I'm using the cstruct syntax extension to let me define the exact layout of the message body.
Here, it generates sizeof_req
, set_req_window
and set_req_property
automatically.
After a bit, I discovered that there are XML files in xcbproto describing the X11 protocol. This provides a Python library for parsing the XML, which you can use by writing a Python script for your language of choice. For example, this glorious 3394 line Python script generates the C bindings. After studying this script carefully, I decided that hard-coding everything wasn't so bad after all.
I ended up having to implement more messages than I expected,
including some surprising ones like OpenFont
(see x11.mli for the final list).
My implementation came to 1754 lines of OCaml,
which is quite a bit shorter than the Python generator script,
so I guess I still came out ahead!
In the X11 protocol, client applications send requests and the server sends replies, errors and events. Most requests don't produce replies, but can produce errors. Replies and errors are returned immediately, so if you see a response to a later request, you know all previous ones succeeded. If you care about whether a request succeeded, you may need to send a dummy message that generates a reply after it. Since message sequence numbers are 16-bit, after sending 0xffff consecutive requests without replies, you should send a dummy one with a reply to resynchronise (but window management involves lots of round-trips, so this isn't likely to be a problem for us). Events can be sent by the server at any time.
Unlike Wayland, which is very regular, X11 has various quirks.
For example, every event has a sequence number at offset 2, except for KeymapNotify
.
Initialising the window manager
Using Xwayland -wm FD
actually prevents any client applications from connecting at all at first,
because Xwayland then waits for the window manager to be ready before accepting any client connections.
To fix that, we need to claim ownership of the WM_S0
selection.
A "selection" is something that can be owned by only one application at a time.
Selections were originally used to track ownership of the currently-selected text, and later also used for the clipboard.
WM_S0
means "Window Manager for Screen 0" (Xwayland only has one screen).
1 2 3 |
|
Instead of passing things like WM_S0
as strings in each request, X11 requires us to first intern the string.
This returns a unique 32-bit ID for it, which we use in future messages.
Because intern
may require a round-trip to the server, it returns a promise,
and so we use let*
instead of let
to wait for that to resolve before continuing.
let*
is defined in the Lwt.Syntax
module, as an alternative to the more traditional >>=
notation.
This lets our clients connect. However, Xwayland still isn't creating any Wayland surfaces. By reading the Sommelier code and stepping through Xwayland with a debugger, I found that I needed to enable the Composite extension.
Composite was originally intended to speed up redraw operations, by having the server keep a copy of every top-level window's pixels (even when obscured), so that when you move a window it can draw it right away without asking the application for help. The application's drawing operations go to the window's buffer, and then the buffer is copied to the screen, either automatically by the X server or manually by the window manager. Xwayland reuses this mechanism, by turning each window buffer into a Wayland surface. We just need to turn that on:
1 2 |
|
This says that every child of the root window should use this system. Finally, we see Xwayland creating Wayland surfaces:
-> wl_compositor@5.create_surface id:+28
Now we just need to make them appear on the screen!
Windows
As usual for Wayland, we need to create a role object and attach it to the surface. This tells Wayland whether the surface is a window or a dialog, for example, and lets us set the title, etc.
But first we have a problem: we need to know which X11 window corresponds to each Wayland surface.
For example, we need the title, which is stored in a property on the X11 window.
Xwayland does this by sending the new window a ClientMessage event of type WL_SURFACE_ID
containing the Wayland ID.
We don't get this message by default, but it seems that selecting SubstructureRedirect
on the root does the trick.
SubstructureRedirect
is used by window managers to intercept attempts by other applications to change the children of the root window.
When an application asks the server to e.g. map a window, the server just forwards the request to the window manager.
Operations performed by the window manager itself do not get redirected, so it can just perform the same request the client wanted, or
make any changes it requires.
In our case, we don't actually need to modify the request, so we just re-perform the original map
operation:
1 2 3 4 5 6 7 8 9 |
|
Having two separate connections to Xwayland is quite annoying, because messages can arrive in any order.
We might get the X11 ClientMessage
first and need to wait for the Wayland create_surface
, or we might get the create_surface
first
and need to wait for the ClientMessage
.
An added complication is that not all Wayland surfaces correspond to X11 windows.
For example, Xwayland also creates surfaces representing cursor shapes, and these don't have X11 windows.
However, when we get the ClientMessage
we can be sure that a Wayland message is on the way,
so I just pause the X11 event handling until that has arrived:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Another complication is that Wayland doesn't allow you to attach a buffer to a surface until the window has been "configured". Doing so is a protocol error, and Sway will disconnect us if we try! But Xwayland likes to attach the buffer immediately after creating the surface.
To avoid this, I use a queue:
- Xwayland asks to create a surface.
- We forward this to Sway, add its ID to the
unpaired
map, and create a queue for further events. - Xwayland asks us to attach a buffer, etc. We just queue these up.
- We get the
ClientMessage
over the X11 connection and create a role for the new surface. - Sway sends us a
configure
event, confirming it's ready for the buffer. - We forward the queued events.
However, this creates a new problem: if the surface isn't a window then the events will be queued forever.
To fix that, when we get a create_surface
we also do a round-trip on the X11 connection.
If the window is still unpaired when that returns then we know that no ClientMessage
is coming, and we flush the queue.
X applications like to create dummy windows for various purposes (e.g. receiving clipboard data),
and we need to avoid showing those.
They're normally set as override_redirect
so the window manager doesn't handle them,
but Xwayland redirects them anyway (it needs to because otherwise e.g. tooltips wouldn't appear at all).
I'm trying various heuristics to detect this, e.g. that override redirect windows with a size of 1x1 shouldn't be shown.
If Sway asks us to close a window, we need to relay that to the X application using the WM_DELETE_WINDOW
protocol,
if it supports that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Wayland defaults to using client-side decorations (where the application draws its own window decorations). X doesn't do that, so we need to turn it off (if the Wayland compositor supports the decoration manager extension):
1 2 3 4 5 6 7 8 |
|
Dialog boxes are more of a problem.
Wayland requires every dialog box to have a parent window, but X11 doesn't.
To handle that, the proxy tracks the last window the user interacted with and uses that as a fallback parent
if an X11 window with type _NET_WM_WINDOW_TYPE_DIALOG
is created without setting WM_TRANSIENT_FOR
.
That could be a problem if the application closes that window, but it seems to work.
Performance
I noticed a strange problem: scrolling around in GVim had long pauses once a second or so, corresponding to OCaml GC runs. This was surprising, as OCaml has a fast incremental garbage collector, and is normally not a problem for interactive programs. Besides, I'd been using the proxy with the (Wayland) Firefox and xfce4-terminal applications for 6 months without any similar problem.
Using perf
showed that Linux was spending a huge amount of time in release_pages
.
The problem is that Xwayland was sharing lots of short-lived memory pools with the proxy.
Each time it shares a pool, we have to ask the VM host for a chunk of memory of the same size.
We map both pools into our address space and then copy each frame across
(this is needed because we can't export guest memory to the host).
Normally, an application shares a single pool and just refers to regions within it, so we just map once at startup and unmap at exit. But Xwayland was creating, sharing and discarding around 100 pools per second while scrolling in GVim! Because these pools take up a lot of RAM, OCaml was (correctly) running the GC very fast, freeing them in batches of 100 or so each second.
First, I tried adding a cache of host memory, but that only solved half the problem: freeing the client pool was still slow.
Another option is to unmap the pools as soon as we get the destroy message, to spread the work out. Annoyingly, OCaml's standard library doesn't let you free memory-mapped memory explicitly (see the Add BigArray.Genarray.free PR for the current status), but adding this myself with a bit of C code would have been easy enough. We only touch the memory in one place (for the copy), so manually checking it hadn't been freed would have been pretty safe.
Then I noticed something interesting about the repeated log entries, which mostly looked like this:
-> wl_shm@4.create_pool id:+26 fd:(fd) size:8368360
-> wl_shm_pool@26.create_buffer id:+28 offset:0 width:2090 height:1001 stride:8360 format:1
-> wl_shm_pool@26.destroy
<- wl_display@1.delete_id id:26
-> wl_buffer@28.destroy
<- wl_display@1.delete_id id:28
Xwayland creates a pool, allocates a buffer within it, destroys the pool (so it can't create more buffers), and then deletes the buffer. But it never uses the buffer for anything!
So the solution was simple: I just made the host buffer allocation and the mapping operations lazy. We force the mapping if a pool's buffer is ever attached to a surface, but if not we just close the FD and forget about it. Would be more efficient if Xwayland only shared the pools when needed, though.
Pointer events
Wayland delivers pointer events relative to a surface, so we simply forward these on to Xwayland unmodified and everything just works.
I'm kidding - this was the hardest bit! When Xwayland gets a pointer event on a window, it doesn't send it directly to that window. Instead, it converts the location to screen coordinates and then pushes the event through the old X event handling mechanism, which looks at the X11 window stack to decide where to send it.
However, the X11 window stack (which we saw earlier with xwininfo -tree -root
) doesn't correspond to the Wayland window layout at all.
In fact, Wayland doesn't provide us any way to know where our windows are, or how they are stacked.
Sway seems to handle this via a backdoor: X11 applications do get access to location information even though native Wayland clients don't. This is one of the reasons I want to get X11 support out of the compositor - I want to make sure X11 apps don't have any special access. Sommelier has a solution though: when the pointer enters a window we raise it to the top of the X11 stack. Since it's the topmost window, it will get the events.
Unfortunately, the raise request goes over the X11 connection while the pointer events go over the Wayland one. We need to make sure that they arrive in the right order. If the computer is running normally, this isn't much of a problem, but if it's swapping or otherwise struggling it could result in events going to the wrong place (I temporarily added a 2-second delay to test this). This is what I ended up with:
- Get a wayland pointer enter event from Sway.
- Pause event delivery from Sway.
- Flush any pending Wayland events we previously sent to Xwayland by doing a round-trip on the Wayland connection.
- Send a raise on the X11 connection.
- Do a round-trip on the X11 connection to ensure the raise has completed.
- Forward the enter event on the Wayland connection.
- Unpause the event stream from Sway.
At first I tried queuing up just the pointer events, but that doesn't work because e.g. keyboard events need to be synchronised with pointer events. Otherwise, if you e.g. Shift-click on something then the click gets delayed but the Shift doesn't and it can do the wrong thing. Also, Xwayland might ask Sway to destroy the window while we're entering it, and Sway might confirm the deletion. Pausing the whole event stream from Sway fixes all these problems.
The next problem was how to do the two round-trips.
For X11 we just send an Intern
request after the raise and wait to get a reply to that.
Wayland provides the wl_display.sync
method to clients, but we're acting as a Wayland server to Xwayland,
not a client.
I remembered that Wayland's xdg-shell extension provides a ping from the server to the client
(the compositor can use this to detect when an application is not responding).
Unfortunately, Xwayland has no reason to use this extension because it doesn't deal with window roles.
Luckily, it uses it anyway (it does need it for non-rootless mode and doesn't bother to check).
wl_display.sync
works by creating a fresh callback object, but xdg-shell's ping
just sends a pong
event to a fixed object,
so we also need a queue to keep track of pings in flight so we don't get confused between our pings and any pings we're relaying for Sway.
Also, xdg-shell's ping requires a serial number and we don't have one.
But since Xwayland is the only app this needs to support, and it doesn't look at that, I cheat and just send zero.
And that's how to get pointer events to go to the right window with Xwayland.
Keyboard events
A very similar problem exists with the keyboard.
When Wayland says the focus has entered a window
we need to send a SetInputFocus
over the X11 connection
and then send the keyboard events over the Wayland one,
requiring another two round-trips to synchronise the two connections.
Pointer cursor
Some applications set their own pointer shape, which works fine.
But others rely on the default and for some reason you get no cursor at all in that case.
To fix it, you need to set a cursor on the root window, which applications will then inherit by default.
Unlike Wayland, where every application provides its own cursor bitmaps,
X very sensibly provides a standard set of cursors, in a font called cursor
(this is why I had to implement OpenFont
).
As cursors have two colours and a mask, each cursor is two glyphs: even numbered glyphs are the image and the following glyph is its mask:
1 2 3 4 5 6 7 8 9 10 |
|
Selections
The next job was to get copying text between X and Wayland working.
In X11:
- When you select something, the application takes ownership of the
PRIMARY
selection. - When you click the middle button or press Shift-Insert, the application requests
PRIMARY
. - When you press Ctrl-C, the application takes ownership of the
CLIPBOARD
selection. - When you press Ctrl-V it requests
CLIPBOARD
.
It's quite neat that adding support for a Windows-style clipboard didn't require changing the X server at all. Good forward-thinking design there.
In Wayland, things are not so simple. I have so far found no less than four separate Wayland protocols for copying text:
gtk_primary_selection
supports copying the primary selection, but not the clipboard.wp_primary_selection_unstable_v1
is identical togtk_primary_selection
except that it renames everything.wl_data_device_manager
supports clipboard transfers but not the primary selection.zwlr_data_control_manager_v1
supports both, but it's for a "privileged client" to be a clipboard manager.
gtk_primary_selection
and wl_data_device_manager
both say they're stable, while the other two are unstable.
However, Sway dropped support for gtk_primary_selection
a while ago, breaking many applications
(luckily, I had a handy Wayland proxy and was able to add some adaptor code
to route gtk_primary_selection
messages to the new "unstable" protocol).
For this project, I went with wp_primary_selection_unstable_v1
and wl_data_device_manager
.
On the Wayland side, everything has to be written twice for the two protocols, which are almost-but-not-quite the same.
In particular, wl_data_device_manager
also has a load of drag-and-drop stuff you need to ignore.
For each selection (PRIMARY
or CLIPBOARD
), we can be in one of two states:
- An X11 client owns the selection (and we own the Wayland selection).
- A Wayland client owns the selection (and we own the X11 selection).
When we own a selection we proxy requests for it to the matching selection on the other protocol.
- At startup, we take ownership of the X11 selection, since there are no X11 apps running yet.
- When we lose the X11 selection it means that an X11 client now owns it and we take the Wayland selection.
- When we lose the Wayland selection it means that a Wayland client now owns it and we take the X11 selection.
One good thing about the Wayland protocols is that you send the data by writing it to a normal Unix pipe. For X11, we need to write the data to a property on the requesting application's window and then notify it about the data. And we may need to split it into multiple chunks if there's a lot of data to transfer.
A strange problem I had was that, while pasting into GVim worked fine, xterm would segfault shortly after trying to paste into it.
This turned out to be a bug in the way I was sending the notifications.
If an X11 application requests the special TEXT
target, it means that the sender should choose the exact format.
You write the property with the chosen type (e.g. UTF8_STRING
),
but you must still send the notification with the target TEXT
.
xterm is a C application (thankfully no longer set-uid!) and seems to have a use-after-free bug in the timeout code.
Drag-and-drop
Sadly, I wasn't able to get this working at all. X itself doesn't know anything about drag-and-drop and instead applications look at the window tree to decide where the user dropped things. This doesn't work with the proxy, because Wayland doesn't tell us where the windows really are on the screen.
Even without any VMs or proxies, drag-and-drop from X applications to Wayland ones doesn't work, because the X app can't see the Wayland window and the drop lands on the X window below (if any).
Bonus features
In the last post, I mentioned several other problems, which have also now been solved by the proxy:
HiDPI works
Wayland's support for high resolution screens is a bit strange. I would have thought that applications really only need to know two things:
- The size in pixels of the window.
- The size in pixels you want some standard thing (e.g. a normal-sized letter M).
Some systems instead provide the size of the window and the DPI (dots-per-inch), but this doesn't work well. For example, a mobile phone might be high DPI but still want small text because you hold it close to your face, while a display board will have very low DPI but want large text.
Wayland instead redefines the idea of pixel to be a group of pixels corresponding to a single pixel on a typical 1990's display. So if you set your scale factor to 2 then 1 Wayland pixel is a 2x2 grid of physical pixels. If you have a 1000x1000 pixel window, Wayland will tell the application it is 500x500 but suggest a scale factor of 2. If the application supports HiDPI mode, it will double all the numbers and render a 1000x1000 image and things work correctly. If not, it will render a 500x500 pixel image and the compositor will scale it up.
Since Xwayland doesn't support this, it just draws everything too small and Sway scales it up, creating a blurry and unusable mess. This might be made worse by subpixel rendering, which doesn't cope well with being scaled.
With the proxy, the solution is simple enough: when talking to Xwayland we just scale everything back up to the real dimensions, scaling all coordinates as we relay them:
1 2 3 |
|
1 2 3 4 5 6 7 8 9 |
|
This will tend to make things sharp but too small, but X applications already have their own ways to handle high resolution screens.
For example, you can set Xft.dpi
to make all the fonts bigger. I run this proxy like this, which works for me:
wayland-proxy-virtwl --x-display=0 --xrdb Xft.dpi:150 --x-unscale=2
However, there is a problem. The Wayland specification says:
The new size of the surface is calculated based on the buffer size transformed by the inverse buffer_transform and the inverse buffer_scale. This means that at commit time the supplied buffer size must be an integer multiple of the buffer_scale. If that's not the case, an invalid_size error is sent.
Let's say we have an X11 image viewer that wants to show a 1001-pixel-high image in a 1001-pixel-high window. This isn't allowed by the spec, which can only handle even-sized windows when the scale factor is 2. Regular Wayland applications already have to deal with that somehow, but for X11 applications it becomes our problem.
I tried rounding down, but that has a bad side-effect: if GTK asks for a 1001-pixel high menu and gets a 1000 pixel allocation, it switches to squashed mode and draws two big bumper arrows at the top and bottom of the menu which you must use to scroll it. It looks very silly.
I also tried rounding up, but tooltips look bad with any rounding. Either one border is missing, or it's double thickness. Luckily, it seems that Sway doesn't actually enforce the rule about surfaces being a multiple of the scale factor. So, I just let the application attach a buffer of whatever size it likes to the surface and it seems to work!
The only problem I had was that when using unscaling, the mouse pointer in GVim would get lost. Vim hides it when you start typing, but it's supposed to come back when you move the mouse. The problem seems to be that it hides it by creating a 1x1 pixel cursor. Sway decides this isn't worth showing (maybe because it's 0x0 in Wayland-pixels?), and sends Xwayland a leave event saying the cursor is no longer on the screen. Then when Vim sets the cursor back, Xwayland doesn't bother updating it, since it's not on screen!
The solution was to stop applying unscaling to cursors. They look better doubled in size, anyway. True, this does mean that the sharpness of the cursor changes as you move between windows, but you're unlikely to notice this due to the far more jarring effect of Wayland cursors also changing size and shape at the same time.
Ring-buffer logging
Even without a proxy to complicate things, Wayland applications often have problems. To make investigating this easier, I added a ring-buffer log feature. When on, the proxy keeps the last 512K or so of log messages in memory, and will dump them out on demand.
To use it, you run the proxy with e.g. -v --log-ring-path ~/wayland.log
.
When something odd happens (e.g. an application crashes, or opens its menus in the wrong place) you can
dump out the ring buffer and see what just happened with:
echo dump-log > /run/user/1000/wayland-1-ctl
I also added some filtering options (e.g. --log-suppress motion,shm
) to suppress certain classes of noisy messages.
Vim windows open correctly
One annoyance with Sway is that Vim's window always appears blank (even when running on the host, without any proxy). You have to resize it before you can see the text.
My proxy initially suffered from the same problem, although only intermittently.
It turned out to be because Vim sends a ConfigureRequest
with its desired size and then waits for the confirmation message.
Since Sway is a tiling window manager, it ignores the new size and no event is generated.
In this case, an X11 window manager is supposed to send a synthetic ConfigureNotify
,
so I just got the proxy to do that and the problem disappeared
(I confirmed this by adding a sleep to Vim's gui_mch_update
).
By the way, the GVim start-up code is quite interesting.
The code path to opening the window goes though three separate functions which each define a
static int recursive = 0
and then proceed to behave differently depending on how many times they've
been reentered - see gui_init for an example!
Copy-and-paste without ^M characters
The other major annoyance with Sway is that copy-and-paste doesn't work correctly (Sway bug #1839). Using the proxy avoids that problem completely.
Conclusions
I'm not sure how I feel about this project.
It ended up taking a lot longer than I expected, and I could probably have ported several X11 applications to Wayland in the same time.
On the other hand, I now have working X support in the VMs with no need for ssh -Y
from the host, plus support for HiDPI in Wayland, mouse cursors that are large enough to see easily, windows that open reliably, text pasting that works, and I can get logs whenever something misbehaves.
In fact, I'm now also running an instance of the proxy directly on the host to get the same benefits for host X11 applications.
Setting this up is actually a bit tricky:
you want to start Sway with DISPLAY=:0
so that every application it spawns knows it has an X11 display,
but if you set that then Sway thinks you want it to run nested inside an X window provided by the proxy,
which doesn't end well (or, indeed, at all).
Having all the legacy X11 support in a separate binary should make it much easier to write new Wayland compositors, which might be handy if I ever get some time to try that. It also avoids having many thousands of lines of legacy C code in the highly-trusted compositor code.
If Wayland had an official protocol for letting applications know the window layout then I could make drag-and-drop between X11 applications within the same VM work, but it still wouldn't work between VMs or to Wayland applications, so it's probably not worth it.
Having two separate connections to Xwayland creates a lot of unnecessary race conditions. A simple solution might be a Wayland extension that allows the Wayland server to say "please read N bytes from the X11 socket now", and likewise in the other direction. Then messages would always arrive in the order in which they were sent.
The code is all available at https://github.com/talex5/wayland-proxy-virtwl if you want to try it. It works with the applications I use when running under Sway, but will probably require some tweaking for other programs or compositors. Here's a screenshot of my desktop using it:
The windows with [dev]
in the title are from my Debian VM, while [com]
is a SpectrumOS VM I use for email, etc.
Gitk, GVim and ROX-Filer are X11 applications using Xwayland,
while Firefox and xfce4-terminal are using plain Wayland proxying.




Hiring a Developer Educator — Jane Street, Oct 21, 2021
We spend a lot of time on education at Jane Street. Like, really a lot.




SCoP Passed Phase 1 of the DAPSI Initiative! — Tarides, Oct 14, 2021
In April, we announced that the DAPSI initiative accepted the proposal for our Secure-by-Design Communication Protocols (SCoP) project. Today, we are thrilled to announce that SCoP has passed the initiative’s Phase 1, and we are now on our way to Phase 2!
SCoP is an open, secure, and resource-efficient infrastructure to engineer a modern basis for open messaging (for existing and emerging protocols) using type-safe languages and unikernels—to ensure your private information remains secure. A…
Read more...In April, we announced that the DAPSI initiative accepted the proposal for our Secure-by-Design Communication Protocols (SCoP) project. Today, we are thrilled to announce that SCoP has passed the initiative’s Phase 1, and we are now on our way to Phase 2!
SCoP is an open, secure, and resource-efficient infrastructure to engineer a modern basis for open messaging (for existing and emerging protocols) using type-safe languages and unikernels—to ensure your private information remains secure. After all, you wouldn’t like your postal carrier reading your snail mail, so why should emails be any different?
Challenges
To operate an email service requires many technical skills and reliable infrastructure. As a result, only a few large companies can handle emails with the proper security levels. Unfortunately, the core business model of these companies is to mine your personal data.
The number of emails exchanged every day is expected to reach 333 billion in 2022. That’s a considerable amount of data, much of it private or sensitive, sent across Cyberspace through portals with questionable security. The ‘memory unsafe’ languages used in most communication services leave far too much room for mistakes that have serious ramifications, like security flaws that turn into security breaches, leaving your personal or business information vulnerable to malicious hackers.
Due to this global challenge, we set out to build a simple, secure, easily deployable solution to preserve users' privacy, and we’re making great strides toward accomplishing that goal. We base our systems on scientific foundations to last for decades and drive positive change for the world. Our robust understanding of both theory and practice enables us to solve these security problems, so we explore ideas where research and engineering meet at the intersection of the domains of operating systems, distributed systems, and programming languages.
Every component of SCoP is carefully designed as independent libraries, using modern development techniques to avoid the common reported threats and flaws. For instance, the implementation of protocol parsers and serializers are written in a type-safe language and tested using fuzzing. Combining these techniques will increase users' trust to migrate their personal data to these new, more secure services.
Architecture
The architecture of the SCoP communication service is composed of an Email Service based on a secure extension of the SMTP protocol, and a decentralised real-time communication system based on Matrix.
The SMTP and Matrix protocols implemented in SCoP follow the separation of concerns design principle, meaning that the SMTP Sender and SMTP Receiver are designed as two distinct units. They’re implemented as isolated micro-services which run as unikernels. The SMTP Sender, Receiver, and Matrix are all configurable, and each configuration comes with a security risk analysis report to understand possible privacy risks
Progress
Not only are we on our way to Phase 2 in the DAPSI Initiative, but we’re also proud to report that we’re on track with our planned milestones!
Our first milestone was to generate a corpus of emails to test our parser implementation against existing projects in order to detect differences between the descriptions specified in the RFCs. We now have 1 million emails that have been parsed/encoded without any issues! Our email corpus keeps isomorphism between the encoder and decoder, and you can find it in this GitHub Repo, as we encourage implementors of other languages to use it to improve their trust in their own implementation.
We set out to implement an SMTP extension mechanism and support for SPF as well as implement DMARC, a security framework, on top of DKIM and SPF for our second milestone, and we are right on target. To date, we’ve completed four components:
- SPF
- DKIM
- SMTP can send and verify emails
- MrMIME can generate the email, then SMTP sends the email (signed by a DKIM private key). We can correctly sign an email, generate a signature, and the DKIM field containing the signature. When the email is received, we check the DKIM signature and the SPF metadata.
For our third milestone, we set out to implement DNSSEC, a set of security extensions over DNS. This security layer verifies the identity of an email sender through DKIM/SPF/DMARC, but it also needs security extensions in the DNS protocol. We completed our initial investigation of a DNSSEC implementation prototype, and we discovered several issues, like some of the elliptic curve cryptography was missing. Those necessary cryptographic primitives are now available, so we should complete this milestone by the end of the month.
Finally, our fourth milestone was to implement the Matrix protocol (client and server). We completed the protocol’s client library, which sends a notification from OCaml CI. Plus, we have a PoC, and Matrix’s server-side, which received the notification, is also complete.
Although we still have much work ahead of us, we’re quite pleased with the progress thus far, and so is the DAPSI Initiative! Follow our progress by subscribing to this blog and our Twitter feed (@tarides_) for the latest updates.
Hide




The New Replaying Benchmark in Irmin — Tarides, Oct 04, 2021
As mentioned in our Tezos Storage / Irmin Summer 2021 Update on the Tezos Agora forum, the Irmin team's goal has been to improve Irmin's performance in order to speed up the Baking Account migration process in Octez, and we managed to make it 10x faster in the first quarter of 2021. Since then, we've been working on a new benchmark program for Irmin that's based on the interactions between Irmin and Octez. This won't just help make Irmin even faster, it will also help speed up the Tezos blockcha…
Read more...As mentioned in our Tezos Storage / Irmin Summer 2021 Update on the Tezos Agora forum, the Irmin team's goal has been to improve Irmin's performance in order to speed up the Baking Account migration process in Octez, and we managed to make it 10x faster in the first quarter of 2021. Since then, we've been working on a new benchmark program for Irmin that's based on the interactions between Irmin and Octez. This won't just help make Irmin even faster, it will also help speed up the Tezos blockchain process and enable us to monitor Irmin's behavior in Octez.
Octez is the Tezos node implementation that uses Irmin to store the blockchain state, so Irmin is a core component of Octez that's responsible for virtually all the filesystem operations. Whether a node is launched to produce new blocks (aka “bake”) or just to participate in peer-to-peer sharing of existing blocks, it must first update itself by rebuilding blocks individually until it reaches the head of the blockchain. This first phase is called bootstrapping, and once it reaches the blockchain head, we say it has been bootstrapped. Currently, the bootstrapped phase processes 2 block per minute, which is the rate at which the Tezos blockchain progresses. The next goal is to increase that rate to 12 blocks per minute.
Irmin stores the content of the Tezos blockchain on a disk using the irmin-pack
library. There is one-to-one correspondence between the Tezos block and the Irmin commits. Each time Tezos produces a block, Irmin produces a commit, and then the Tezos block hash is computed using the Irmin commit hash. The Irmin developers are working on improving the irmin-pack
performance which in turn will improve the performance of Octez.
A benchmark program is considered “fair” when it's representative of how the benchmarked code is used in the real world—for example, the access-patterns to Irmin. A standard database benchmark would first insert random data and then remove it. Such a synthetic benchmark would fail to reproduce the bottlenecks that occur when the insertions and removal are interleaved. Our solution to “fairness” is radical: replaying. Within a sandboxed environment, we replay a real world situation.
Basically, our new benchmark program makes use of a benchmarked code and records statistics for later analysis. The program is stored in the irmin-bench
library and makes use of operation traces (called action traces) when Octez runs with Irmin. Later, the program replays the recorded operations one at a time while simultaneously recording tonnes of statistics (called stat traces). Data analysis of the stat traces may reveal many interesting facts about the behaviour of Irmin, especially if we tweak:
- the configuration of Irmin (e.g., what’s the impact of doubling the size of a certain cache?)
- the replay parameters (e.g., does Irmin's performance decay over time? Does
irmin-pack
perform as well after 24 hours of replay as after 1 minute of replay?) - the hardware (e.g., does
irmin-pack
perform well on a Raspberry Pi?) - the code of Irmin (e.g., does this PR have an impact on performance?)
This benchmarking process is similar to the record-replay feature available with TezEdge.
Recording the Action Trace
By adding logs to Tezos, we can record the Tezos-Irmin interactions and thus capture the Irmin “view” of Tezos. We’ve recorded action traces during the bootstrapping phase of Tezos nodes, which started from Genesis—the name of the very first Tezos block inserted into an empty Irmin store.
The interaction surface between Irmin and Octez is quite simple, so we were able to reduce it to eight (8) elementary operations:
checkout
, to pull an Irmin tree from disk;find
,mem
andmem_tree
, read only operations on an Irmin tree;add
,remove
andcopy
, write only operations on an Irmin tree;commit
, to push an Irmin tree to disk.
It’s important to remember that Irmin behaves much like Git. It has built-in snapshotting and is compatible with Git itself when using the irmin-git
library. In fact, these operations are very similar to Git, too.
Sequence of Operations
To illustrate further, here's a concrete example of an operation sequence inside an action trace:
This shows Octez’s first interaction with Irmin at the very beginning of the blockchain! The first block, Genesis, is quite small (it ends at operation #5), but the second one is massive (it ends at operation #309273). It contains no transactions because it only sets up the entire structure of the tree. It precedes the beginning of Tezos's initial protocol called “Alpha I”.
Benchmark Benefits
Our benchmark results convey the sheer magnitude of the Tezos blockchain and the role that Irmin plays within it. We’ve recorded a trace that covers the blocks from the beginning the blockchain in June 2018 all the way up to May 2021. It weighs 96GB.
Although it took 34 months for Tezos to reach that state, bootstrapping so far takes only 170 hours, and replaying it takes a mere 37 hours on a section of the blockchain that contains 1,343,486 blocks. On average, this corresponds to 1 per minute when the blocks were created, 132 per minute when bootstrapping, and 611 per minute during replay.
On this particular section of the blockchain, Octez had 1,089,853,521 interactions with Irmin. On average, this corresponds to 12 per second when the blocks were created, 1782 per second during bootstrapping, and 8258 per second during replay.
The chart below demonstrates how many of each Irmin operation occur per block (on average):
This next chart displays where the time is spent during replay:
With irmin-pack
, an OCaml thread managed by the index
library is running concurrently to the main thread (i.e., the merge thread), a fraction of the durations (shown above) are actually spent in that thread. Refer to this blog post for more details on index
's merges.
The following chart illustrates how memory usage evolves during replay:
On a logarithmic scale, this last chart shows the evolution of the write amplification, which indicates the amount of rewriting (e.g., at the end of the replay, 20TB of data have been written to disk in order to create a store that weighs 73GB).
The merge operations of the index
library are the source of this poor write amplification. The Irmin team is working hard on improving this metric:
- on the one hand, the new structured keys feature of the upcoming Irmin 3.0 release will help to reduce the pressure on the
index
library, - on the other hand, we are working on algorithmic improvements of
index
itself.
Another nice way to use the trace is for testing. When replaying a trace, we can recompute the commit hashes and check that they correspond to the trace hashes, so the benchmark acts as additional tests to ensure we don't compromise the hashes computed in Tezos.
Complex changes to Tezos can be simulated first in Irmin. For example, the path flattening in Tezos feature (merged in August 2021) can now be tested earlier in the process with our benchmark. Prior to the trace benchmarks, we first had to make the changes in Tezos to understand their repercussions on Irmin directly from the Tezos benchmarks.
Lastly, we continue to test alternative libraries and compare them with the ones integrated in Tezos; however, using these alternative libraries to build Tezos nodes has proven to be more complicated than merely adding them in Irmin and running our benchmarks. While testing continues on most new libraries, we can definitely use replays to compare our new cactus
library as a replacement for our index
library.
Future Directions
While the action trace recording was only made possible on a development branch of Octez, we would next like to upstream the feature to the main branch of Octez, which would give all users the option to record Tezos-Irmin interactions. This would simplify bug reporting overall.
Although the first version only deals with the bootstapping phase of Tezos, an upcoming goal is to make it possible to benchmark the boostrapped phase of Tezos as well. Additionally, we plan to replay the multiprocess aspects of a Tezos node in the near future.
The first stable version of this benchmark has existed in Irmin’s development branch since Q2 2021, and we will release it as part of irmin-bench
for Irmin 3.0 in Q4 2021. This release will allow integration into the Sandmark OCaml benchmarking suite.
Follow the Tarides blog for future Irmin updates.
Hide



Announcing Tezos’ 8th protocol upgrade proposal: Hangzhou — Tarides, Sep 21, 2021
The last upgrade of the Tezos protocol, Granada, activated on August 6th, 2021. We are now glad to announce a new protocol proposal, Hangzhou, the result of a collaborative work from various teams.
This is a joint post with Nomadic Labs, Marigold, Oxhead Alpha and DaiLambda.




Measuring OCaml compilation speed after a refactoring — GaGallium (Florian Angeletti), Sep 17, 2021
The OCaml typechecker is an important piece of the OCaml compiler pipeline which accounts for a significant portion of time spent on compiling an OCaml program (see the appendices).
The code of the typechecker is also quite optimised, sometime…
Read more...The OCaml typechecker is an important piece of the OCaml compiler pipeline which accounts for a significant portion of time spent on compiling an OCaml program (see the appendices).
The code of the typechecker is also quite optimised, sometimes to the detriment of the readability of the code. Recently, Jacques Garrigue and Takafumi Saikawa have worked on a series of pull requests to improve the readability of the typechecker (#10337, #10474, #10541). Unfortunately, those improvements are also expected to increase the typechecking time of OCaml programs because they add abstraction barriers, and remove some optimisations that were breaking the abstraction barriers.
The effect is particularly pronounced on #10337. Due to the improvement of the readability of the typechecker, this pull request has been merged after some quick tests to check that the compilation time increase was not too dire.
However, the discussion on this pull request highlighted the fact that it was difficult to measure OCaml compilation time on a scale large enough to enable good statistical analysis and that it would be useful.
Consequently, I decided to try my hand at a statistical analysis of OCaml compilation time, using this pull request #10337 as a case study. Beyond this specific PR, I think that it is interesting to write down a process and a handful of tools for measuring OCaml compilation time on the opam ecosystem.
Before doing any kind of analysis, the first step is to find an easy way to collect the data of interest. Fortunately, the OCaml compiler can emit timing information with flag -dtimings
. However, this information is emitted on stdout, whereas my ideal sampling process would be to just pick an opam package, launch a build process and recover the timing information for each file. This doesn’t work if the data is sent to the stdout, and never see again. This first step is thus to create a version of the OCaml compiler that can output the timing information of the compilation to a specific directory. With this change (#10575), installing an opam package with
OCAMLPARAM=",_,timings=1,dump-dir= /tmp/pkgnname" opam install pkgname
outputs all profiling information to /tmp/pkgname
.
This makes it possible to collect large number of data points on compilation times by using opam, and the canonical installation process of each package without the need of much glue code.
For this case study, I am using 5 core packages containers
, dune
, tyxml
, coq
and base
. Once their dependencies are added, I end up with
- ocamlfind
- num
- zarith
- seq
- containers
- coq
- dune
- re
- ocamlbuild
- uchar
- topkg
- uutf
- tyxml
- sexplib0
- base
Then it is a matter of repeatedly installing those packages, and measuring the compilation times before and after #10337.
In order to get more reliable statistics on each file, each package was compiled 250 times leading to 1,6 millions of data points (available here) after slightly more than a week-end of computation.
In order to try to reduce the noise induced by the operating system scheduler, the compilation process is run with OPAMJOBS=1
. Similarly, the compilation process was isolated as much as possible from the other process using the cset
Linux utility to reserve one full physical core to the opam processes.
The code for collecting samples, analyzing them, and plotting the graphs below is available at https://github.com/Octachron/ocaml-perfomance-monitoring.
Comparing averages, files by files
With the data at hand, we can compute the average compilation by files, and by stage of the OCaml compiler pipeline. In our case, we are mostly interested in the typechecking stage, and global compilation time, since #10337 should only alter the time spent on typechecking. It is therefore useful to split the compilation time into typechecking + other=total
. Then for each files in the 15 packages above, we can can compute the average time for each of those stages and the relative change of average compilation time: time after/time before
.
Rendering those relative changes for the typechecking time, file by file (with the corresponding 90% confidence interval) yields
To avoid noise, I have removed files for which the average typechecking time was inferior to one microsecond on the reference version of the compiler.
In the graph above, there are few remarkable points:
- As expected, the average typechecking time increased for almost all files
- A significant portion of points are stuck to the line “after/before=1”. This means that for those files there was no changes at all of the typechecking times.
- The standard deviation time varies wildly across packages. The typechecking of some dune files tend to have a very high variances. However outside of those files, the standard deviation seems moderate, and the mean estimator seem to have converged.
- For a handful a files for which the typechecking time more than doubled. However the relative typechecking time does seem to be confined in the
[1,1.2]
range for a majority of files.
Since the data is quite noisy, it is useful before trying to interpret it to check that we are not looking only at noise. Fortunately, we have the data on the time spent outside of the typechecking stage available, and those times should be mostly noise. We have thus a baseline, that looks like
This cloud of points look indeed much noisier. More importantly, it seems centred around the line after/before=1
. This means that our hypothesis that the compilation time outside of the typechecking stage has not been altered is not visibly invalidated by our data points. An other interesting point is that the high variance points seems to be shared between the typechecking and other graphs.
We can even check on the graphs for the average total compilation (file by file)
that those points still have a high variance here. However, outside of this cluster of points, we have a quite more compact distribution of points for the total compilation time: it seems that we have a quite consistent increase of the total compilation time of around 3%.
And this is reflected in the averages:
Typechecking average | Other average | Total average |
---|---|---|
1.06641 | 1.01756 | 1.03307 |
There is thus an increase of around 6.6% of typechecking time which translates to an increase of 3.3% of total time. However, the non-typechecking time also increased by 1.7% in average. The average is thus either tainted by some structural bias or the relative variance (mean/ratio) is still enough for the distribution of the ratio to be ill-behaved (literature seems to indicate that a relative variance < 10% is required for the distribution of ratio to be Gaussian-like). Anyway, we probably cannot count on a precision of more than 1.7%. Even with this caveat, we still have a visible effect on the total compilation time.
We might better served by comparing the geometric average. Indeed, we are comparing ratio of time, with possibly a heavy-tailed noise. By using the geometric average (which compute the exponential of the arithmetic mean of the logarithms of our ratio), we can check that rare events don’t have an undue influence on the average. In our case the geometric means looks like
Typechecking geometric average | Other geometric average | Total geometric average |
---|---|---|
1.05963 | 1.01513 | 1.03215 |
All geometric averages have decreased compared to the arithmetic means, which is a sign that the compilation time distribution is skewed towards high compilation times. However, the changes are small and do not alter our previous interpretation.
We can somewhat refine those observations by looking at the medians (which are even less affected by the heavy-tailness of distributions)
Typechecking median | Other median | Total media |
---|---|---|
1.03834 | 1.00852 | 1.02507 |
Here, the non-typechecking times seems far less affected by the structural bias (with an increase of 0.9%) whereas the increase of typechecking time and total compilation time are reduced but still here at 3.8% and 2.5% respectively.
Comparing averages, quantiles
We can refine our analysis by looking at the quantiles of those relative changes of compilation time
% | mean quantiles |
---|---|
1% | 0.875097 |
10% | 1 |
25% | 1.001 |
50% | 1.03834 |
75% | 1.08826 |
90% | 1.162 |
99% | 1.51411 |
99.9% | 2.76834 |
Here we see that the typechecking time of around 25% of files is simply not affected at all by the changes. And for half of the files, the compilation time is inferior to 9%. Contrarily, there is 1% of files for which the typechecking time increases by more than 50% (with outliers around 200%-400% increase).
However, looking at the total compilation does seems to reduce the overall impact of the change
% | total quantiles |
---|---|
1% | 0.945555 |
10% | 1 |
25% | 1.00707 |
50% | 1.02507 |
75% | 1.05 |
90% | 1.07895 |
99% | 1.17846 |
99.9% | 1.4379 |
Indeed, we still have 25% of files not impacted, but for 65% of files the relative increase of compilation time is less than 8%. (and the outliers stalls at a 50% increase)
We can also have a quick look at the quantiles for the non-typechecking time
% | other quantiles |
---|---|
1% | 0.855129 |
10% | 0.956174 |
25% | 0.995239 |
50% | 1.00852 |
75% | 1.03743 |
90% | 1.08618 |
99% | 1.25541 |
99.9% | 1.67784 |
but here the only curiosity if that the curve is more symmetric and we have 25% of files for which the non-typechecking compilation time decrease randomly.
Noise models and minima
One issue with our previous analysis is the high variance which is observable in the non-typechecking average times across files. A possibility to mitigate this issue is to change our noise model. Using an average, we implicitly assumed that the compilation time was mostly:
observable_compilation_time = theoretical_computation_time + noise
where noise is a random variable with at least a finite variance and a mean of 0
. Indeed, with this symmetry hypothesis the expectation of the observable computation time aligns with the theoretical compilation time:
E[observable_computation_time] = E[theoretical_computation_time] + E[noise] = theoretical_computation_time
and the variance Var[observable_computation_time]
is exactly the variance of the noise Var[noise]
. Then our finite variance hypothesis ensure that the empirical average hypothesis converges relatively well towards the theoretical expectation.
However, we can imagine another noise model with a multiplicative noise (due to CPU scheduling for instance),
observable_compilation_time = scheduling_noise * theoretical_computation_time + noise
with both scheduling_noise>1
and noise>1
. With this model, the expectation of the observable compilation time does not match up with the theoretical computation time:
E[observable_computation_time] - theoretical_computation_time =
(E[scheduling_noise]-1) * theoretical_computation_time + E[noise]
Thus, in this model, the average observable computation time is a structurally biased estimator for the theoretical computation time. This bias might be compensated by the fact that we are only looking to ratio. Nevertheless, this model also induces a second source of variance
Var[observable_computation_time] = theoretical_computation_time^2 Var[scheduling_noise] + Var[noise]
(/assuming the two noises are not correlated), and this variance increases with the theoretical computation time. This relative standard deviation might be problematic when computing ratio.
If this second noise model is closer to reality, using the empirical average estimators might be not ideal. However, the positivity of the noise opens another avenue for estimators: we can consider the minima of a series of independent realisations. Then, we have
min(observable_compilation_time) = min(scheduling_noise * theoretical_computation_time) + min(noise) = theoretical_computation_time
if the min(scheduling_noise)=1
and min(noise)=0
.
This model has another advantage: by assuming that the (essential) support of the noise distribution has finite lower bound, we know that the empirical minima will converge towards a three-parameter Weibull distribution with a strictly positive support. (To be completely explicit, we also need to assume some regularity of the distribution around this lower bound too).
This means that the distribution ratio of the empirical minima will not exhibits the infinite moments of the ratio of two Gaussians. Without this issue, our estimator should have less variance.
However, we cannot use Gaussian confidence intervals for the empirical minima. Moreover, estimating the confidence interval for the Weibull distribution is more complex. Since we are mostly interested in corroborating our previous result, we are bypassing the computation of those confidence intervals.
Comparing minima
We can then restart out analysis using the minimal compilation time file-by-file. Starting with the minimal typechecking time, we get
There are notable differences with the average version: - a very significant part of our points takes the same time to typecheck before and after #10337 - there is a discretization effects going on: data points tend to fall on exactly the same value of the ratio
Beyond those changes, there is still a visible general increase of typechecking time.
The same differences are visible for the non typechecking compilation time
and the total compilation time
but overall the minimal total compilation and non-typechecking time mirrors what we had seen with the average. The distribution of the non-typechecking times is maybe more evenly centred around a ratio of 1.
We can have a look at the averages and median (across files) to have more global point of view
Typechecking | Other | Total | |
---|---|---|---|
Average | 1.06907 | 1.01031 | 1.02901 |
Geometric average | 1.05998 | 1.00672 | 1.0276 |
Median | 1 | 1 | 1 |
A striking change is that the median for the typechecking and total compilation time is equal to one: more than half of files are not affected by the changes when looking at the minimal compilation time. This might be an issue with the granularity of time measurement, or it could be a genuine fact.
More usefully, we still have an increase of average typechecking time between 6% and 6.9% depending on the averaging methods, which translates to a total compilation time increase between 2.7% and 3.3%. And this time, the increase of unrelated compilation time is between 0.7% to 0.9%. This seems to confirms that do have a real increase of average compilation time and 3% increase time is a reasonable number.
Comparing minima, quantiles
With the discretization, the quantiles of the compilation time are quite interesting and uncharacteristic.
For instance the typechecking quantiles,
% | min quantiles |
---|---|
1% | 1 |
10% | 1 |
25% | 1 |
50% | 1 |
75% | 1.07692 |
90% | 1.2 |
99% | 2 |
99.9% | 2 |
are stuck to 1 between the first and 50th centile. In other words the minimal typechecking time of more than 50% of the files in our experiment is unchanged. For 40% of the files, the increase is less than 20%. And the most extreme files see only an increase of 100% of the typechecking time. On the higher quantiles, the presence of multiple jumps is the consequence of the discretization of ratio that was already visible on the raw data.
When looking at the time spent outside of typechecking,
% | min_other quantiles |
---|---|
1% | 0.8 |
10% | 0.947368 |
25% | 1 |
50% | 1 |
75% | 1 |
90% | 1.11111 |
99% | 1.33333 |
99.9% | 1.6 |
we observe that the non-typechecking relation compilation time for more than 80% of file is unaffected by the change (or somehow accelerated for 10% of files).
The quantiles for the total compilation time,
% | min_total quantiles |
---|---|
1% | 0.92 |
10% | 1 |
25% | 1 |
50% | 1 |
75% | 1.04545 |
90% | 1.1 |
99% | 1.22727 |
99.9% | 1.4 |
mostly reflects the trends set by the typechecking time: 55% of files are unaffected. For 90% of file the increase is less than 10%, and the maximal impact on the compilation time peaks at a 40% relative increase.
Conclusion
To sum up, with the available data at hands, it seems sensible to conclude that #10337 resulted in an average increase of compilation time of the order of 3%, while the average relative increase of typechecking time is around 6%. Moreover, for the most impacted files (at the ninth decile), the relative increase in compilation time ranges between 10% to 40%.
Appendices
Compilation profile
Since we have data for both typechecking time and non-typechecking times for a few thousand files, it is interesting to check how much time is spent on typechecking. We can start by looking at the data points files by files:
We have here a relatively uniform cloud of points between 20-60% of time spent in typechecking compared to total compilation time. This is is reflected on the average and median
Arithmetic average | Median |
---|---|
38.8827% | 39.7336% |
Both value are quite comparable, the distribution doesn’t seem significantly skewed.
However, we have a clear cluster of files for which typechecking accounts for 90% of the total compilation time. Interestingly, this cluster of points corresponds to the dune cluster of files with a very variance that we had identified earlier. This explains why those files have essentially the same profile when looking at the total and typechecking compilation time: in their case, typechecking accounts for most of the work done during compilation.
This relatively uniform distribution is visible both on the quantiles (with an affine part of the quantiles)
% | profile quantiles |
---|---|
1% | 0.111556 |
10% | 0.16431 |
25% | 0.283249 |
50% | 0.397336 |
75% | 0.487892 |
90% | 0.573355 |
99% | 0.749689 |
99.9% | 0.913336 |
and on the histogram of the relative time spent in typechecking
Histograms
Histogram versions for the quantile diagrams are also available. Due to the mixture of continuous and discrete distributions they are not that easy to read. Note that those histograms have equiprobable bins (in other words, constant area) rather than constant width bins.
Average compilation time histograms
An interesting take-away for those histograms is that the typechecking and total compilation time distribution are clearly skewed to the right: with very few exceptions, compilation increases. Contrarily the non-typechecking time distribution is much more symmetric. Since the change here is due to noise, there is no more reason for the compilation time to increase or decrease.
Minimal compilation time histograms
There is no much change when looking at the histogram for the minimal compilation time for a file
The most notable difference is that the non-typechecking histogram is completely dominated by the dirac distribution centred at x=1
.




Writing Lifters Using Primus Lisp — The BAP Blog, Sep 15, 2021
Defining instructions semantics using Primus Lisp (Tutorial)
Introduction
So you found a machine instruction that is not handled by BAP and you wonder how to add it to BAP. This is the tutorial that will gently guide you through the whole process of discovering the instruction, studying its semantics, encoding it, testing, and finally submitting to BAP. The latter is optional but highly appreciated.
In modern BAP, the easiest option is to use Primus Lisp to define new instructions. The idea i…
Read more...Defining instructions semantics using Primus Lisp (Tutorial)
Introduction
So you found a machine instruction that is not handled by BAP and you wonder how to add it to BAP. This is the tutorial that will gently guide you through the whole process of discovering the instruction, studying its semantics, encoding it, testing, and finally submitting to BAP. The latter is optional but highly appreciated.
In modern BAP, the easiest option is to use Primus Lisp to define new instructions. The idea is that for each instruction, you describe its effects using Primus Lisp. No recompilation or OCaml coding is required. For example here is the semantics of the tst
instruction in the thumb mode taken from BAP,
(defun tTST (rn rm _ _)
"tst rn, rm"
(let ((rd (logand rn rm)))
(set ZF (is-zero rd))
(set NF (msb rd))))
You now probably have the question: what is tTST
and why it has four parameters? We use LLVM (and now Ghidra) as the disassembler backend and when we write a lifter (i.e., when we define the semantics of instructions) we rely on the representation of instructions that are provided by the backend. In LLVM it is MC Instructions, which more or less corresponds to LLVM MIR.
Picking an instruction
So let’s find some ARM instruction that is not yet handled by BAP and use it as a working example. The bap mc command is an ideal companion in writing lifters (mc - stands for machine code), as it lets you work with individual instructions. Besides, I wasn’t able to find any ARM instruction that we do not handle (except co-processors instructions, which are not interesting from the didactic perspective) so we will be working with the thumb2 instruction set. So first of all, we need a binary encoding for the instruction that we would like to lift. If you don’t have one then use llvm-mc (or any other assembler). The encoding (which I found from some wild-caught arm binary is 2d e9 f8 43
and we can disassemble it using bap mc
$ bap mc --arch=thumb --show-insn=asm -- 2d e9 f8 43
push.w {r3, r4, r5, r6, r7, r8, r9, lr}
Or, if we want to go the other way around, from assembly to binary encoding then here is the llvm-mc command-line,
$ echo "push.w {r3, r4, r5, r6, r7, r8, r9, lr}" | llvm-mc -triple=thumb -mattr=thumb2 --show-inst -show-encoding
.text
push.w {r3, r4, r5, r6, r7, r8, r9, lr} @ encoding: [0x2d,0xe9,0xf8,0x43]
@ <MCInst #4118 t2STMDB_UPD
@ <MCOperand Reg:15>
@ <MCOperand Reg:15>
@ <MCOperand Imm:14>
@ <MCOperand Reg:0>
@ <MCOperand Reg:75>
@ <MCOperand Reg:76>
@ <MCOperand Reg:77>
@ <MCOperand Reg:78>
@ <MCOperand Reg:79>
@ <MCOperand Reg:80>
@ <MCOperand Reg:81>
@ <MCOperand Reg:13>>
Now we need to get full information about the instruction name. In BAP, all instruction names are packaged in namespaces to enable multiple backends. To get full information about the instruction encoding and decoding we will use the –show-knowledge option of bap mc. This command-line option will instruct BAP to dump the whole knowledge base, so it will have everything that bap knows so far about the instruction. The property that we’re looking for, is bap:lisp-name
and bap:insn
. The first will give us the true name and the last will show us how the instruction and operands are represented, e.g.,
$ bap mc --arch=thumb --show-knowledge -- 2d e9 f8 43 | grep lisp-name
(bap:lisp-name (llvm-thumb:t2STMDB_UPD))
$ bap mc --arch=thumb --show-knowledge -- 2d e9 f8 43 | grep bap:insn
(bap:insn ((t2STMDB_UPD SP SP 0xe Nil R3 R4 R5 R6 R7 R8 R9 LR)))
Writing the stub semantics
Okay, now we have nearly all the knowledge so we can start writing the semantics. Let’s start with some stub semantics, we will later look into the instruction set manual and learn the instruction semantics, but we want to make sure that BAP loads our files and calls our semantics function.
BAP searches for the semantics file in a number of predefined locations (see bap --help
and search for --primus-lisp-semantics
option for more details). The default locations include you $HOME/.local/share/bap/primus/semantics
and the system-wide location that is usually dependent on your installation (usually it is in an opam switch in <opam-switch>/share/bap/primus/semantics
. So you can either place your file in the home location or just pick an arbitrary location and tell bap where to search for it using --bap-primus-semantics=<dir>
. In our example, we use the current folder (denoted with .
in Unix systems) where we create a file named extra-thumb2.lisp
(the name of the file doesn’t really matter for the purposes of example, as long as it has the .lisp
extension).
Now, we can create the stub definition of the instruction,
$ cat extra-thumb2.lisp
(defun llvm-thumb:t2STMDB_UPD (_ _ _ _ _ _ _ _ _ _ _ _)
(msg "yay, it works"))
and let’s check that bap properly dispatches the semantics,
$ bap mc --arch=thumb --primus-lisp-semantics=. --show-bil -- 2d e9 f8 43
yay, it works
{
}
Learning the instruction semantics
So what does this series of _ _ _ _ ...
mean? We see that bap applies t2STMDB_UPD
as
bap mc --arch=thumb --show-knowledge -- 2d e9 f8 43 | grep bap:insn
(bap:insn ((t2STMDB_UPD SP SP 0xe Nil R3 R4 R5 R6 R7 R8 R9 LR)))
So it has 12 operands. We haven’t yet learned their semantics so we just ignored them. In Primus Lisp, like in OCaml, you can use the wildcard character as the argument name, if you’re not using it. Now it is time to figure out what do they mean. The encoding of the llvm machine instruction is defined in the LLVM target description (*.td) files, which we can find in the LLVLM GitHub repository, namely, we can learn,
// PUSH/POP aliases for STM/LDM
def : t2InstAlias<"push${p}.w $regs", (t2STMDB_UPD SP, pred:$p, reglist:$regs)>;
So now we know that push.w regs
is an alias (syntactic sugar) to stmdb sp!, {r3, r4, r5, r6, r7, r8, r9, lr}
(or even stmdb sp!, {r3-r9, lr
}. Let’s check that our gues is correct using llvm-mc
,
$ echo 'stmdb sp!, {r3-r9, lr}' | llvm-mc -triple=thumb -mattr=thumb2 -show-encoding
.text
push.w {r3, r4, r5, r6, r7, r8, r9, lr} @ encoding: [0x2d,0xe9,0xf8,0x43]
Indeed, the encoding is the same. So now it is time to download an instruction reference and look for the stm
instruction. It is on page 357 of the pdf version, in section c2.141, and it says that instruction stores multiple registers. The db
suffix is the addressing mode that instructs us to Decrement the address Before each store operation. And the !
suffix (encoded as _UPD
in llvm) prescribes us to store the final address back to the destination register. This is a high-level reference intended for asm
programmers, so if we need more details with the precisely described semantics we can also look into the ARM Architecture Reference Manual (the pdf file). Here is the semantics obtained from this reference, which is described in the ARM pseudocode,
(I had to screen-shot it, as the indentation matters in their pseudo-code but it could not be copied properly from the pdf file).
Learning Primus Lisp
So we’re now ready to write some lisp code. You may have already skimmed through the Primus Lisp documentation that describes the syntax and semantics of the language. If you didn’t don’t worry (but it is still advised to do this eventually). Primus Lisp is a Lisp dialect that looks much like Common Lisp. Since it is Lisp, it has a very simple syntax - everything is an s-expression, i.e., either an atom, e.g., 1, ‘hello, r0, or an application of a name to a list of arguments, e.g., (malloc 100) or (malloc (sizeof ptr_t)). The semantics of Primus Lisp is very simple as well. Basically, Primus Lisp is a universal syntax for assemblers. There is no other data type than bitvectors. From the perspective of the type system we distinguish bitvectors by their sizes, e.g., 0x1:1 is a one-bitvector with the only bit set to 1 and 0x1:16 is a 16-bitvector which has different size and different value and is a 16-bit word with the lower bit set to 1. All operations evaluate to words and accept words. Now what operations you can use? We can use BAP to get the up-to-date documentation for Primus Lisp. For this we will use the primus-lisp-documentation command (we need to pass the –semantics option as we have two interperters for Primus Lips, one for dynamic execution and another for static execution, which have different libraries and slightly different set of primitives),
bap primus-lisp-documentation --semantics > semantics.org
This command will generate the API documentation in the emacs org-mode. If you don’t want to use Emacs to read this file then you can use pandoc
to transform it to any format you like, e.g.,
pandoc semantics.org -o semantics.pdf
That will generate the following document.
Encoding semantics (the first attempt)
Now, after we skimmed through the documentation, let’s make our first ugly (and may be incorrect) attempt to describe the semantics of the instruction,
(defun llvm-thumb:t2STMDB_UPD (dst base _pred _? r1 r2 r3 r4 r5 r6 r7 r8)
(let ((len 8)
(stride (sizeof word-width))
(ptr (- base (* len stride))))
(store-word (+ ptr (* stride 0)) r1)
(store-word (+ ptr (* stride 1)) r2)
(store-word (+ ptr (* stride 2)) r3)
(store-word (+ ptr (* stride 3)) r4)
(store-word (+ ptr (* stride 4)) r5)
(store-word (+ ptr (* stride 5)) r6)
(store-word (+ ptr (* stride 6)) r7)
(store-word (+ ptr (* stride 7)) r8)
(set$ dst ptr)))
and before getting into the details, let’s see how bap translates this semantics to BIL,
$ bap mc --arch=thumb --primus-lisp-semantics=. --show-bil -- 2d e9 f8 43
{
#3 := SP - 0x20
mem := mem with [#3, el]:u32 <- R3
mem := mem with [#3 + 4, el]:u32 <- R4
mem := mem with [#3 + 8, el]:u32 <- R5
mem := mem with [#3 + 0xC, el]:u32 <- R6
mem := mem with [#3 + 0x10, el]:u32 <- R7
mem := mem with [#3 + 0x14, el]:u32 <- R8
mem := mem with [#3 + 0x18, el]:u32 <- R9
mem := mem with [#3 + 0x1C, el]:u32 <- LR
SP := #3
}
The result looks good and, maybe even correct, but let’s explore the lisp code.
The first thing to notice is that instead of using _ _ ...
placeholders we gave each parameter a name. The first parameter is the destination register (it is the llvm rule that all functions that update a register have this register as the first parameter), then we have the base register (in our working example the destination and the base register are the same). Next, we have the _pred which we currently ignore, but will return later. We use the _ prefix to indicate that it is unused. Then there is an operand of unknown purpose, which we denoted as _?
(I usually just blank them with _
, but this is to show that lisp allows you to use any non-whitespace characters in the identifier). Finally, we have r1
til r8
, which binds the passed registers. Here, r1
denotes not the register r1
of ARM but the first register passed to the function, i.e., r3
in our example. (It is to show that you can pick any names for your parameters and that they can even shadow the globally defined target-specific register names, which is probably a bad idea from the readability perspective, choosing something like xN
would be probably a better idea).
Now it is time to look into the body of the function. First of all, we used (let (<bindings>) <body>)
construct to bind several variables. Each binding has the form (<varN> <exprN>)
and it evaluates <exprN>
and binds its value to <varN>
and make it available for expressions <exprN+1>
and in the <body>
, e.g., in the full form,
(let ((<var1> <expr1>)
(<var2> <expr2>)
....
(<varN> <exprN>))
<body>)
we can use <var1>
in <expr2>, ..., <exprN>
, and in the <body>
, i.e., this is the lexical scope of <var1>
. Once the let expression is evaluated <var1>
becomes unbound (or bound to whatever it was bound before. In other words, normal lexical scoping, which is totally equivalent to the OCaml,
let var1 = expr1 in
let var2 = expr2 in
...
let varN = exprN in
body
The let-bound variables are reified into temporary variables in BIL. You may also notice that Primus Lisp Semantic compiler is clever enough and removed the constants. Let’s go through the variables.
The first variable is len
is a constant equal to 8
, which is the number of registers passed to the function. Unfortunately, llvm encodes the 15 registers not with a bitset (as it is actually represented in the instruction) but as a list of operands. So instead of writing one function, we will need to write 15 overloads per each number of registers in the list.
The next variable is also a constant, but we use a complex expression to describe it, (sizeof word-width)
. The Primus Lisp Semantics Compiler is a meta-interpreter and it computes all values that are known in the static time. As a result, we don’t see this constant in the reified code.
Unleashing the macro power
Now let’s deal with the body. It goes without saying that it is ugly. We have to manually unfold the loop since we don’t have a program variable that denotes the set of registers that we have to store, but instead, llvm represents this instruction as a set of 15 overloads. BAP Primus Lisp supports overloads as well, but we definitely won’t like to write 15 overloads with repetitive code, it is tedious, error-prone, and insults us as programmers.
Here comes the macro system. Primus Lisp employs a very simple macro system based on term rewriting. For each application (foo x y z)
the parser looks for a definition of macro foo that has the matching number of parameters. If an exact match is found then (foo x y z)
is rewritten with the body of the match. Otherwise, a match with the largest number of parameters is selected and the excess arguments are bound to the last parameter, e.g., if we have
(defmacro foo (x xs) ...)
then it will be called with x
bound to a
and xs
bound to (b c)
if applied as (foo a b c)
, where a
, b
, c
could be arbitrary s-expressions.
The body of the macro may reference itself, i.e., it could be recursive. To illustrate it let’s write the simplest recursive macro that will compute the length of the list,
(defmacro length (r) 1)
(defmacro length (_ rs) (+ 1 (length rs)))
and now we can check that it works by adding
(msg "we have $0 registers" (length r1 r2 r3 r4 r5 r6 r7 r8))
to the body of our definition. It will print,
we have 0x8 registers
Note that macros do not perform any compuations on bitvectors, unlike the Primus meta compiler. The macro engine operates on s-expressions, and (length r1 r2 r3 r4 r5 r6 r7 r8)
is rewritten (+ 1 (+ 1 (+ 1 (+ 1 (+ 1 (+ 1 (+ 1 1)))))))
on the syntatic level and is later reduced by the Primus Meta Compiler into 8
.
To solidify the knowledge and to move forward let’s write a macro store-registers
that will take an arbitrary number of registers, e.g., (store-registers base stride off r1 r2 r3 .... rm)
which will unfold to a sequence of stores, where each register rN
is stored as (store-word (+ base (* stride N)) rN)
.
First, let’s define the base case of our recursion,
(defmacro store-registers (base stride off reg)
(store-word (+ base (* stride off)) reg))
and now the recursive case,
(defmacro store-registers (base stride off reg regs)
(prog
(store-registers base stride off reg)
(store-registers base stride (+ off 1) regs)))
Notice, that the body of the macro must be a single s-expression. There is no so-called implicit body and if we need to chain two expressions we have to explicitly use the prog
construct. This construct has a very simple semantics, (prog s1 s2 ... sN)
means evaluate s1
, then s2
, and so on until sN
and make the result of the whole form equal to the result of the evaluation of the last expression.
And a better representation of the body will be,
(defun llvm-thumb:t2STMDB_UPD (dst base _pred _? r1 r2 r3 r4 r5 r6 r7 r8)
(let ((len 8)
(stride (sizeof word-width))
(ptr (- base (* len stride))))
(store-registers ptr stride 0 r1 r2 r3 r4 r5 r6 r7 r8)
(set$ dst ptr)))
And let’s double-check that we still have the same reification to BIL with
$ bap mc --arch=thumb --primus-lisp-semantics=. --show-bil -- 2d e9 f8 43
{
#3 := SP - 0x20
mem := mem with [#3, el]:u32 <- R3
mem := mem with [#3 + 4, el]:u32 <- R4
mem := mem with [#3 + 8, el]:u32 <- R5
mem := mem with [#3 + 0xC, el]:u32 <- R6
mem := mem with [#3 + 0x10, el]:u32 <- R7
mem := mem with [#3 + 0x14, el]:u32 <- R8
mem := mem with [#3 + 0x18, el]:u32 <- R9
mem := mem with [#3 + 0x1C, el]:u32 <- LR
SP := #3
}
In fact, we can also put into a macro the body of our function, and our length macro will be of use here, e.g.,
(defun llvm-thumb:t2STMDB_UPD (dst base _pred _? r1 r2 r3 r4 r5 r6 r7 r8)
(stmdb_upd dst base r1 r2 r3 r4 r5 r6 r7 r8))
(defmacro stmdb_upd (dst base regs)
(let ((len (length regs))
(stride (sizeof word-width))
(ptr (- base (* len stride))))
(store-registers ptr stride 0 regs)
(set$ dst ptr)))
A side-note on the macro resolution mechanism
A small notice on the macro resolution process. Another way of describing it is that the rewriting engine picks the most specific definition. In fact, this is true for the definitions also. The resolution mechanism collects all definitions for the given name that matches the given context and selects the most specific. The context is defined using the Primus Lisp type class mechanism. The only difference between macros and functions (beyond that macros operate on the syntax tree level) is that macros add the number of arguments (arity) to the context so that the definition with the highest arity is ordered after (have precedence over) all other definitions. This is described more thoroughly in the reference documentation. Another important takeaway for us is that when we write a lifter we end up referencing target-specific functions and registers. So we would like to limit the context of the applicability of our functions to the specified target. (Otherwise, our functions will be loaded and type-checked for all architectures, e.g., when an x86 binary is parsed, and we don’t want this). Therefore, we should start our lifter with the context declaration that will be automatically attached to each definition in the file,
(declare (context (target arm)))
Taming names with namespaces and packages
Now, let’s look into the packages. A package is the Common Lisp (and Lisp in general) name for a namespace. A namespace is just a set of names, and each name in Primus Lisp has a package. In fact, when we write (defmacro stmdb_upd () ...)
, i.e., without specifying the namespace, the name is automatically prefixed with the current namespace/package. By default, it is the user
package. So our definition was read by the Lisp reader as (defmacro user:stmdb_upd ...)
. It is not only the macro or function names that are packaged. Any identifier is properly namespaced. So that (store-registers ptr stride 0 regs)
is read as (user:store-registers user:ptr user:stride 0 user:regs)
.
The namespaces are also thoroughly described in the documentation but the rendered documentation is outdated because our bot is broken (I really need to fix it), so right now I can only refer you to the sources of the documentation in the mli file. And if you have bap installed in opam, then you can also install odig
and odoc
and use odig
doc bap to generate and read the documentation on your machine. (It will spill an error but just repeat and it will show the correctly rendered documentation, it is a bug upstream that we have reported but… well I have diverged).
what we will do now, we will define the thumb package and llvm-thumb package. The first package will be our working package where we will put all our definitions. And the second package will be specific to llvm,
(defpackage thumb (:use core target arm))
(defpackage llvm-thumb (:use thumb))
(in-package thumb)
The (:use core target arm)
stanza means that all definitions in these packages are imported (read “copied”) into the thumb
package. And the same for llvm-thumb
, basically, (defpackage llvm-thumb (:use thumb))
means copy (export) all definitions from thumb
to llvm-thumb
.
Both thumb
and llvm-thumb
packages are already defined in BAP, so we can just say
(in-package thumb)
Unlike Common Lisp, in Primus Lisp, the same package could be defined multiple times and even used before defined. The packages may even mutually import each other. The package system resolves it and finds the least fixed point, but it is probably too much for such a simple tutorial. For us, the main takeaway is that we don’t need to write llvm-thumb and are no longer polluting the namespace with our definitions thanks to packaging and contexts.
The final step, writing overloads
Unfortunately, there is no macro mechanism that will operate on the definition level and generate definitions from some pattern. We probably will develop something for that, but right now for each overload we still need to write a corresponding function, e.g.,
(defun t2STMDB_UPD (dst base _pred _?
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15)
(stmdb_upd dst base r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15))
....
(defun t2STMDB_UPD (dst base _pred _?
r1)
(stmdb_upd dst base r1))
So that the final version of our code will look like this (see also the gist version)
(declare (context (target arm)))
(in-package thumb)
(defun t2STMDB_UPD (dst base _pred _?
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15)
(stmdb_upd dst base r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15))
(defun t2STMDB_UPD (dst base _pred _?
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14)
(stmdb_upd dst base r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14))
(defun t2STMDB_UPD (dst base _pred _?
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13)
(stmdb_upd dst base r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13))
(defun t2STMDB_UPD (dst base _pred _?
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12)
(stmdb_upd dst base r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12))
(defun t2STMDB_UPD (dst base _pred _?
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11)
(stmdb_upd dst base r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11))
(defun t2STMDB_UPD (dst base _pred _?
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10)
(stmdb_upd dst base r1 r2 r3 r4 r5 r6 r7 r8 r9 r10))
(defun t2STMDB_UPD (dst base _pred _?
r1 r2 r3 r4 r5 r6 r7 r8 r9)
(stmdb_upd dst base r1 r2 r3 r4 r5 r6 r7 r8 r9))
(defun t2STMDB_UPD (dst base _pred _?
r1 r2 r3 r4 r5 r6 r7 r8)
(stmdb_upd dst base r1 r2 r3 r4 r5 r6 r7 r8))
(defun t2STMDB_UPD (dst base _pred _?
r1 r2 r3 r4 r5 r6 r7)
(stmdb_upd dst base r1 r2 r3 r4 r5 r6 r7))
(defun t2STMDB_UPD (dst base _pred _?
r1 r2 r3 r4 r5 r6)
(stmdb_upd dst base r1 r2 r3 r4 r5 r6))
(defun t2STMDB_UPD (dst base _pred _?
r1 r2 r3 r4 r5)
(stmdb_upd dst base r1 r2 r3 r4 r5))
(defun t2STMDB_UPD (dst base _pred _?
r1 r2 r3 r4)
(stmdb_upd dst base r1 r2 r3 r4))
(defun t2STMDB_UPD (dst base _pred _?
r1 r2 r3)
(stmdb_upd dst base r1 r2 r3))
(defun t2STMDB_UPD (dst base _pred _?
r1 r2)
(stmdb_upd dst base r1 r2))
(defun t2STMDB_UPD (dst base _pred _?
r1)
(stmdb_upd dst base r1))
(defmacro stmdb_upd (dst base regs)
(let ((len (length regs))
(stride (sizeof word-width))
(ptr (- base (* len stride))))
(store-registers ptr stride 0 regs)
(set$ dst ptr)))
(defmacro store-registers (base stride off reg)
(store-word (+ base (* stride off)) reg))
(defmacro store-registers (base stride off reg regs)
(prog
(store-registers base stride off reg)
(store-registers base stride (+ off 1) regs)))
(defmacro length (r) 1)
(defmacro length (_ rs) (+ 1 (length rs)))
I think on this we can conclude our tutorial. I will later polish it, but it covers most of the features of Primus Lisp and writing lifters.
Except for the last step - which is making a PR to BAP with the file. Please, once you wrote a semantics for a machine instruction do not hesitate and PR it to the main repository. It should go to plugins/
Happy lifting!
The Bonus Track - Recitation
And as the final accord and recitation, let’s check how we can lift push {r1-r12,r14}
.
1) We will use llvm-mc
to obtain the binary encoding,
$ echo 'push {r1-r12,r14}' | llvm-mc -triple=thumb -mattr=thumb2 -show-encoding
.text
push.w {r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, lr} @ encoding: [0x2d,0xe9,0xfe,0x5f]
2) next we check that our overload is correctly picked up,
$ bap mc --arch=thumb --primus-lisp-semantics=. --show-bil -- 0x2d,0xe9,0xfe,0x5f
{
#3 := SP - 0x34
mem := mem with [#3, el]:u32 <- R1
mem := mem with [#3 + 4, el]:u32 <- R2
mem := mem with [#3 + 8, el]:u32 <- R3
mem := mem with [#3 + 0xC, el]:u32 <- R4
mem := mem with [#3 + 0x10, el]:u32 <- R5
mem := mem with [#3 + 0x14, el]:u32 <- R6
mem := mem with [#3 + 0x18, el]:u32 <- R7
mem := mem with [#3 + 0x1C, el]:u32 <- R8
mem := mem with [#3 + 0x20, el]:u32 <- R9
mem := mem with [#3 + 0x24, el]:u32 <- R10
mem := mem with [#3 + 0x28, el]:u32 <- R11
mem := mem with [#3 + 0x2C, el]:u32 <- R12
mem := mem with [#3 + 0x30, el]:u32 <- LR
SP := #3
}
So, yay, it works! :)
HideView older blog posts.
Syndications


- Ahrefs
- Andrej Bauer
- Andy Ray
- Ashish Agarwal
- CUFP
- Cameleon news
- Caml INRIA
- Caml Spotting
- Coherent Graphics
- Coq
- Cranial Burnout
- Cryptosense
- Daniel Bünzli
- Daniel Bünzli (log)
- Daniil Baturin
- Dario Teixeira
- David Baelde
- David Mentré
- David Teller
- Eray Özkural
- Erik de Castro Lopo
- Etienne Millon
- Frama-C
- GaGallium
- Gabriel Radanne
- Gaius Hammond
- Gemma Gordon (OCaml Labs)
- Gerd Stolpmann
- GitHub Jobs
- Grant Rettke
- Hannes Mehnert
- Hong bo Zhang
- Jake Donham
- Jane Street
- KC Sivaramakrishnan
- Leo White
- Magnus Skjegstad
- Marc Simpson
- Matthias Puech
- Matías Giovannini
- Mike Lin
- Mike McClurg
- Mindy Preston
- MirageOS
- OCaml Book
- OCaml Labs compiler hacking
- OCaml Platform
- OCaml-Java
- OCamlCore.com
- OCamlPro
- ODNS project
- Ocaml XMPP project
- Ocsigen project
- Opa
- Orbitz
- Paolo Donadeo
- Perpetually Curious
- Peter Zotov
- Psellos
- Reason Documentation Blog
- Richard Jones
- Rudenoise
- Rudi Grinberg
- Sebastien Mondet
- Shayne Fletcher
- Stefano Zacchiroli
- Tarides
- The BAP Blog
- Thomas Leonard
- Till Varoquaux
- Xinuo Chen
- Yan Shvartzshnaider