References

Bezanson, J., Edelman, A., Karpinski, S., & Shah, V. B. (2017). Julia: A fresh approach to numerical computing. SIAM Review, 59(1), 65–98.
Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications, 19(2), 171–209.
Domo. (2018). Data never sleeps 6.0. https://www.domo.com/assets/downloads/18_domo_data-never-sleeps-6+verticals.pdf
Fitzgerald, S., Jimenez, D. Z., S., F., Yorifuji, Y., Kumar, M., Wu, L., Carosella, G., Ng, S., Parker, P., R. Carter, & Whalen, M. (2020). IDC FutureScape: Worldwide digital transformation 2021 predictions. IDC FutureScape.
Gantz, J., & Reinsel, D. (2012). The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east. IDC iView: IDC Analyze the Future, 2007(2012), 1–16.
JuMP style guide. (2021). https://jump.dev/JuMP.jl/v0.21/developers/style/#using-vs.-import
Khan, N., Yaqoob, I., Hashem, I. A. T., Inayat, Z., Mahmoud Ali, W. K., Alam, M., Shiraz, M., & Gani, A. (2014). Big data: Survey, technologies, opportunities, and challenges. The Scientific World Journal, 2014.
Meng, X.-L. (2019). Data science: An artificial ecosystem. Harvard Data Science Review, 1(1). https://doi.org/10.1162/99608f92.ba20f892
Perkel, J. M. (2019). Julia: Come for the syntax, stay for the speed. Nature, 572(7767), 141–142. https://doi.org/10.1038/d41586-019-02310-3
Storopoli, J. (2021). Bayesian statistics with julia and turing. https://storopoli.io/Bayesian-Julia
tanmay bakshi. (2021). Baking Knowledge into Machine Learning Models on TechLifeSkills w/ Tanmay Ep.55. https://youtu.be/moyPIhvw4Nk
TEDx Talks. (2020). A programming language to heal the planet together: Julia | Alan Edelman | TEDxMIT. https://youtu.be/qGW0GT1rCvs
van Rossum, G., Warsaw, B., & Coghlan, N. (2001). Style guide for Python code (PEP No. 8). https://www.python.org/dev/peps/pep-0008/
Wickham, H. (2011). The split-apply-combine strategy for data analysis. Journal of Statistical Software, 40(1), 1–29.

  1. 1 exabyte (EB) = 1,000,000 terabyte (TB).↩︎

  2. no C++ or FORTRAN API calls.↩︎

  3. and sometimes milliseconds.↩︎

  4. numba, or even Rcpp or cython?↩︎

  5. have a look at some deep learning libraries in GitHub and you’ll be surprised that Python is only 25%-33% of the codebase.↩︎

  6. this is mostly a Python ecosystem problem, and while R doesn’t suffer heavily from this, it’s not blue skies either.↩︎

  7. or with little effort necessary.↩︎

  8. sometimes even faster than C.↩︎

  9. a petaflop is one thousand trillion, or one quadrillion, operations per second.↩︎

  10. LLVM stands for Low Level Virtual Machine, you can find more at the LLVM website (http://llvm.org).↩︎

  11. if you like to learn more about how Julia is designed you should definitely check Bezanson et al. (2017).↩︎

  12. please note that the Julia results depicted above do not include compile time.↩︎

  13. or, that the memory address pointers to the elements in the column are stored next to each other.↩︎

  14. it is easier because first and last also work on many other collections, so you need to remember less.↩︎

  15. According to Bogumił Kamiński (lead developer and maintainer of DataFrames.jl) on Discourse (https://discourse.julialang.org/t/pull-dataframes-columns-to-the-front/60327/5).↩︎

  16. thanks to Sudete on Discourse (https://discourse.julialang.org/t/pull-dataframes-columns-to-the-front/60327/4) for this suggestion.↩︎

  17. also notice that regular data (up to 10 000 rows) is not big data (more than 100 000 rows). So, if you are dealing primarily with big data please exercise caution in capping your categorical values.↩︎

  18. we are using the LinearAlgebra module from Julia’s standard library.↩︎



Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso