Case Study: Excel Reading / Writing Library Performance Comparison
We're doing a lot of writing to Excel files in our latest ERP project. In the beginning, we use RubyXL to read and write Excel files. However, we found out that often the server crashed and restarted when it is writing Excel file. Turns out, the problem is with the memory usage of RubyXL.
Here's a snapshot of the memory usage when writing Excel file using RubyXL:
Holy cow! RubyXL is using 1.8GB of memory when reading / writing Excel files with 65K rows.
Switching to Python openpyxl Library
In Ruby, the only mature library for writing Excel file is RubyXL which is why we look into library written in other languages. Luckily, one of our developer has an extensive experience with Python so we decided to test Python library openpyxl and see if it has better memory footprint.
Here's the memory footprint of Python openpyxl Excel library:
Holy molly, that's way better!!! We've improved memory usage from 1.8GB to 600MB.
Conclusion
Writing Excel file in Ruby takes a lot of memory. We don't think that the Ruby language itself is the problem but rather it's a problem with the Ruby library. After switching to Python, we have a much better memory usage. There are other alternatives for writing Excel files that might be worth considering like POI (Apache) which is a Java library or libXL which is written in C++ and they have a Node port, too.