![]() ![]() On a busy day, our Spark cluster alone can generate up to 200TB of logs (at the default INFO verbosity level). In addition, the big data platform generates a large amount of log data, and the rapid growth of Uber’s business has led to furious growth of these logs. ![]() For this, we have built a large-scale big data platform that runs over 250,000 Spark analytics jobs per day, where each job could consist of hundreds of thousands of executors, processing over a hundred petabytes of analytical data. As a result, we can now retain all logs at a fraction of the cost, without throwing away any insights, and the compressed logs can be efficiently searched without decompression.Īt Uber, we rely on making data-driven decisions at every level. In aggregate, CLP achieves a 169x compression ratio on our log data, saving storage, memory, and disk/network bandwidth at every level. And so we were forced to discard log files after just a short period of time, given the prohibitive cost of retaining them–that is, until we integrated CLP into the logging library (Log4j) of our big data platform. But as Uber’s business grew rapidly, the amount of data being logged increased dramatically. This allowed our engineers to freely analyze the logs, say for troubleshooting our systems or improving applications. Long, long ago, the amount of data our systems output to logs was small enough that we were able to retain all of the log files. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |