August 12, 2019
Over the past 2 years Flathub has evolved from a wild idea at a hackfest to a community of app developers and publishers making over 600 apps available to end-users on dozens of Linux-based OSes. We couldn’t have gotten anything off the ground without the support of the 20 or so generous souls who backed our initial fundraising, and to make the service a reality since then we’ve relied on on the contributions of dozens of individuals and organisations such as Codethink, Endless, GNOME, KDE and Red Hat. But for our day to day operations, we depend on the continuous support and generosity of a few companies who provide the services and resources that Flathub uses 24/7 to build and deliver all of these apps. This post is about saying thank you to those companies!
Running the infrastructure
Mythic Beasts is a UK-based “no-nonsense” hosting provider who provide managed and un-managed co-location, dedicated servers, VPS and shared hosting. They are also conveniently based in Cambridge where I live, and very nice people to have a coffee or beer with, particularly if you enjoy talking about IPv6 and how many web services you can run on a rack full of Raspberry Pis. The “heart” of Flathub is a physical machine donated by them which originally ran everything in separate VMs – buildbot, frontend, repo master – and they have subsequently increased their donation with several VMs hosted elsewhere within their network. We also benefit from huge amounts of free bandwidth, backup/storage, monitoring, management and their expertise and advice at scaling up the service.
Starting with everything running on one box in 2017 we quickly ran into scaling bottlenecks as traffic started to pick up. With Mythic’s advice and a healthy donation of 100s of GB / month more of bandwidth, we set up two caching frontend servers running in virtual machines in two different London data centres to cache the commonly-accessed objects, shift the load away from the master server, and take advantage of the physical redundancy offered by the Mythic network.
As load increased and we brought a CDN online to bring the content closer to the user, we also moved the Buildbot (and it’s associated Postgres database) to a VM hosted at Mythic in order to offload as much IO bandwidth from the repo server, to keep up sustained HTTP throughput during update operations. This helped significantly but we are in discussions with them about a yet larger box with a mixture of disks and SSDs to handle the concurrent read and write load that we need.
Even after all of these changes, we keep the repo master on one, big, physical machine with directly attached storage because repo update and delta computations are hugely IO intensive operations, and our OSTree repos contain over 9 million inodes which get accessed randomly during this process. We also have a physical HSM (a YubiKey) which stores the GPG repo signing key for Flathub, and it’s really hard to plug a USB key into a cloud instance, and know where it is and that it’s physically secure.
Building the apps
Our first build workers were under Alex’s desk, in Christian’s garage, and a VM donated by Scaleway for our first year. We still have several ARM workers donated by Codethink, but at the start of 2018 it became pretty clear within a few months that we were not going to keep up with the growing pace of builds without some more serious iron behind the Buildbot. We also wanted to be able to offer PR and test builds, beta builds, etc — all of which multiplies the workload significantly.
Thanks to an introduction by the most excellent Jorge Castro and the approval and support of the Linux Foundation’s CNCF Infrastructure Lab, we were able to get access to an “all expenses paid” account at Packet. Packet is a “bare metal” cloud provider — like AWS except you get entire boxes and dedicated switch ports etc to yourself – at a handful of main datacenters around the world with a full range of server, storage and networking equipment, and a larger number of edge facilities for distribution/processing closer to the users. They have an API and a magical provisioning system which means that at the click of a button or one method call you can bring up all manner of machines, configure networking and storage, etc. Packet is clearly a service built by engineers for engineers – they are smart, easy to get hold of on e-mail and chat, share their roadmap publicly and set priorities based on user feedback.
We currently have 4 Huge Boxes (2 Intel, 2 ARM) from Packet which do the majority of the heavy lifting when it comes to building everything that is uploaded, and also use a few other machines there for auxiliary tasks such as caching source downloads and receiving our streamed logs from the CDN. We also used their flexibility to temporarily set up a whole separate test infrastructure (a repo, buildbot, worker and frontend on one box) while we were prototyping recent changes to the Buildbot.
A special thanks to Ed Vielmetti at Packet who has patiently supported our requests for lots of 32-bit compatible ARM machines, and for his support of other Linux desktop projects such as GNOME and the Freedesktop SDK who also benefit hugely from Packet’s resources for build and CI.
Delivering the data
Even with two redundant / load-balancing front end servers and huge amounts of bandwidth, OSTree repos have so many files that if those servers are too far away from the end users, the latency and round trips cause a serious problem with throughput. In the end you can’t distribute something like Flathub from a single physical location – you need to get closer to the users. Fortunately the OSTree repo format is very efficient to distribute via a CDN, as almost all files in the repository are immutable.
After a very speedy response to a plea for help on Twitter, Fastly – one of the world’s leading CDNs – generously agreed to donate free use of their CDN service to support Flathub. All traffic to the dl.flathub.org domain is served through the CDN, and automatically gets cached at dozens of points of presence around the world. Their service is frankly really really cool – the configuration and stats are reallly powerful, unlike any other CDN service I’ve used. Our configuration allows us to collect custom logs which we use to generate our Flathub stats, and to define edge logic in Varnish’s VCL which we use to allow larger files to stream to the end user while they are still being downloaded by the edge node, improving throughput. We also use their API to purge the summary file from their caches worldwide each time the repository updates, so that it can stay cached for longer between updates.
To get some feelings for how well this works, here are some statistics: The Flathub main repo is 929 GB, of which 73 GB are static deltas and 1.9 GB of screenshots. It contains 7280 refs for 640 apps (plus runtimes and extensions) over 4 architectures. Fastly is serving the dl.flathub.org domain fully cached, with a cache hit rate of ~98.7%. Averaging 9.8 million hits and 464 Gb downloaded per hour, Flathub uses between 1-2 Gbps sustained bandwidth depending on the time of day. Here are some nice graphs produced by the Fastly management UI (the numbers are per-hour over the last month):
To buy the scale of services and support that Flathub receives from our commercial sponsors would cost tens if not hundreds of thousands of dollars a month. Flathub could not exist without Mythic Beasts, Packet and Fastly‘s support of the free and open source Linux desktop. Thank you!
October 15, 2018
Last week the Flatpak community woke to the “news” that we are making the world a less secure place and we need to rethink what we’re doing. Personally, I’m not sure this is a fair assessment of the situation. The “tl;dr” summary is: Flatpak confers many benefits besides the sandboxing, and even looking just at the sandboxing, improving app security is a huge problem space and so is a work in progress across multiple upstream projects. Much of what has been achieved so far already delivers incremental improvements in security, and we’re making solid progress on the wider app distribution and portability problem space.
Sandboxing, like security in general, isn’t a binary thing – you can’t just say because you have a sandbox, you have 100% security. Like having two locks on your front door, two front doors, or locks on your windows too, sensible security is about defense in depth. Each barrier that you implement precludes some invalid or possibly malicious behaviour. You hope that in total, all of these barriers would prevent anything bad, but you can never really guarantee this – it’s about multiplying together probabilities to get a smaller number. A computer which is switched off, in a locked faraday cage, with no connectivity, is perfectly secure – but it’s also perfectly useless because you cannot actually use it. Sandboxing is very much the same – whilst you could easily take systemd-nspawn, Docker or any other container technology of choice and 100% lock down a desktop app, you wouldn’t be able to interact with it at all.
Network services have incubated and driven most of the container usage on Linux up until now but they are fundamentally different to desktop applications. For services you can write a simple list of permissions like, “listen on this network port” and “save files over here” whereas desktop applications have a much larger number of touchpoints to the outside world which the user expects and requires for normal functionality. Just thinking off the top of my head you need to consider access to the filesystem, display server, input devices, notifications, IPC, accessibility, fonts, themes, configuration, audio playback and capture, video playback, screen sharing, GPU hardware, printing, app launching, removable media, and joysticks. Without making holes in the sandbox to allow access to these in to your app, it either wouldn’t work at all, or it wouldn’t work in the way that people have come to expect.
What Flatpak brings to this is understanding of the specific desktop app problem space – most of what I listed above is to a greater or lesser extent understood by Flatpak, or support is planned. The Flatpak sandbox is very configurable, allowing the application author to specify which of these resources they need access to. The Flatpak CLI asks the user about these during installation, and we provide the flatpak override command to allow the user to add or remove these sandbox escapes. Flatpak has introduced portals into the Linux desktop ecosystem, which we’re really pleased to be sharing with snap since earlier this year, to provide runtime access to resources outside the sandbox based on policy and user consent. For instance, document access, app launching, input methods and recursive sandboxing (“sandbox me harder”) have portals.
The starting security position on the desktop was quite terrible – anything in your session had basically complete access to everything belonging to your user, and many places to hide.
- Access to the X socket allows arbitrary input and output to any other app on your desktop, but without it, no app on an X desktop would work. Wayland fixes this, so Flatpak has a fallback setting to allow Wayland to be used if present, and the X socket to be shared if not.
- Unrestricted access to the PulseAudio socket allows you to reconfigure audio routing, capture microphone input, etc. To ensure user consent we need a portal to control this, where by default you can play audio back but device access needs consent and work is under way to create this portal.
- Access to the webcam device node means an app can capture video whenever it wants – solving this required a whole new project.
- Sandboxing access to configuration in dconf is a priority for the project right now, after the 1.0 release.
Even with these caveats, Flatpak brings a bunch of default sandboxing – IPC filtering, a new filesystem, process and UID namespace, seccomp filtering, an immutable /usr and /app – and each of these is already a barrier to certain attacks.
Looking at the specific concerns raised:
- Hopefully from the above it’s clear that sandboxing desktop apps isn’t just a switch we can flick overnight, but what we already have is far better than having nothing at all. It’s not the intention of Flatpak to somehow mislead people that sandboxed means somehow impervious to all known security issues and can access nothing whatsoever, but we do want to encourage the use of the new technology so that we can work together on driving adoption and making improvements together. The idea is that over time, as the portals are filled out to cover the majority of the interfaces described, and supported in the major widget sets / frameworks, the criteria for earning a nice “sandboxed” badge or submitting your app to Flathub will become stricter. Many of the apps that access --filesystem=home are because they use old widget sets like Gtk2+ and frameworks like Electron that don’t support portals (yet!). Contributions to improve portal integration into other frameworks and desktops are very welcome and as mentioned above will also improve integration and security in other systems that use portals, such as snap.
- As Alex has already blogged, the freedesktop.org 1.6 runtime was something we threw together because we needed something distro agnostic to actually be able to bootstrap the entire concept of Flatpak and runtimes. A confusing mishmash of Yocto with flatpak-builder, it’s thankfully nearing some form of retirement after a recent round of security fixes. The replacement freedesktop-sdk project has just released its first stable 18.08 release, and rather than “one or two people in their spare time because something like this needs to exist”, is backed by a team from Codethink and with support from the Flatpak, GNOME and KDE communities.
- I’m not sure how fixing and disclosing a security problem in a relatively immature pre-1.0 program (in June 2017, Flathub had less than 50 apps) is considered an ongoing problem from a security perspective. The wording in the release notes?
Zooming out a little bit, I think it’s worth also highlighting some of the other reasons why Flatpak exists at all – these are far bigger problems with the Linux desktop ecosystem than app security alone, and Flatpak brings a huge array of benefits to the table:
- Allowing apps to become agnostic of their underlying distribution. The reason that runtimes exist at all is so that apps can specify the ABI and dependencies that they need, and you can run it on whatever distro you want. Flatpak has had this from day one, and it’s been hugely reliable because the sandboxed /usr means the app can rely on getting whatever they need. This is the foundation on which everything else is built.
- Separating the release/update cadence of distributions from the apps. The flip side of this, which I think is huge for more conservative platforms like Debian or enterprise distributions which don’t want to break their ABIs, hardware support or other guarantees, is that you can still get new apps into users hands. Wider than this, I think it allows us huge new freedoms to move in a direction of reinventing the distro – once you start to pull the gnarly complexity of apps and their dependencies into sandboxes, your constraints are hugely reduced and you can slim down or radically rethink the host system underneath. At Endless OS, Flatpak literally changed the structure of our engineering team, and for the first time allowed us to develop and deliver our OS, SDK and apps in independent teams each with their own cadence.
- Disintermediating app developers from their users. Flathub now offers over 400 apps, and (at a rough count by Nick Richards over the summer) over half of them are directly maintained by or maintained in conjunction with the upstream developers. This is fantastic – we get the releases when they come out, the developers can choose the dependencies and configuration they need – and they get to deliver this same experience to everyone.
- Decentralised. Anyone can set up a Flatpak repo! We started our own at Flathub because there needs to be a center of gravity and a complete story to build out a user and developer base, but the idea is that anyone can use the same tools that we do, and publish whatever/wherever they want. GNOME uses GitLab CI to publish nightly Flatpak builds, KDE is setting up the same in their infrastructure, and Fedora is working on completely different infrastructure to build and deliver their packaged applications as Flatpaks.
- Easy to build. I’ve worked on Debian packages, RPMs, Yocto, etc and I can honestly say that flatpak-builder has done a very good job of making it really easy to put your app manifest together. Because the builds are sandboxed and each runtimes brings with it a consistent SDK environment, they are very reliably reproducible. It’s worth just calling this out because when you’re trying to attract developers to your platform or contributors to your app, hurdles like complex or fragile tools and build processes to learn and debug all add resistance and drag, and discourage contributions. GNOME Builder can take any flatpak’d app and build it for you automatically, ready to hack within minutes.
- Different ways to distribute apps. Using OSTree under the hood, Flatpak supports single-file app .bundles, pulling from OSTree repos and OCI registries, and at Endless we’ve been working on peer-to-peer distribution like USB sticks and LAN sharing.
Nobody is trying to claim that Flatpak solves all of the problems at once, or that what we have is anywhere near perfect or completely secure, but I think what we have is pretty damn cool (I just wish we’d had it 10 years ago!). Even just in the security space, the overall effort we need is huge, but this is a journey that we are happy to be embarking together with the whole Linux desktop community. Thanks for reading, trying it out, and lending us a hand.
- May 2022
- February 2022
- June 2021
- January 2021
- August 2019
- October 2018
- July 2017
- May 2010
- October 2009
- August 2009
- July 2009
- March 2009
- January 2009
- July 2008
- June 2008
- April 2008
- May 2007
- January 2007
- December 2006
- June 2006
- April 2006
- March 2006
- November 2005
- October 2005
- September 2005
- August 2005
- July 2005
- May 2005
- April 2005
- March 2005