To not break your expectations, let’s clarify some things from the start.
This is not something you can run in production, at least yet.
Ok, so let’s go on. If you ever had to work with converting office documents to PDFs you probably know the pain. There are not so many open-source and even commercial tools for this job.
Inspired by a recent story of running Google Chrome in Lambda, I was pumped to repeat similar for something I desperately need. LibreOffice is one these things. I was running it inside Docker container for a year, and it requires a lot of care. LibreOffice doesn’t work well when you run several instances on 1 container. Processes often become zombie and eat all the memory. There is a way to run it as a “server” via socket, but it’s even more unstable.
I was ranting on this for some time, until eventually a salesman from Accusoft reached out to me trying to sell their solution for document conversion.
The only problem pricing bar is too high for me: $7400 per server, or $10,000 per million documents.
This is so ridiculous I decided to build it myself. Here is how.
So after some quick calculations, I end up with a price of $150 per million documents. Quite an improvement over ten grand, isn’t it? Later it proved to be correct.
So as you see the main driver of price is S3, and not Lambda, which is for me funny.
So there are 2 problems preventing bringing LibreOffice to Lambda:
Size. Installed on the system it takes gigabytes of disk space
Portability. It depends on a lot of exotic libraries not available in Lambda.
Given 512MB of disk space in Lambda this is quite a challenge. Fortunately you can disable ton of crap during compilation process. God, why do I need JDBC driver or KDE extensions. That’s how LibreOffice size reduced from 2 to 0.45 GB.
450 MB is already enough to fit in Lambda. But we can more. strip **/* is a magical command to remove symbols from shared objects (.so files)
So now the size is 340 MB, which is 110 MB gzipped 🎉