Data Science at the Command Line
February 8, 2018
This is the website for Data Science at the Command Line, published by O’Reilly October 2014 First Edition. This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.
Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line.
- Obtain data from websites, APIs, databases, and spreadsheets
- Perform scrub operations on text, CSV, HTML/XML, and JSON
- Explore data, compute descriptive statistics, and create visualizations
- Manage your data science workflow
- Create reusable command-line tools from one-liners and existing Python or R code
- Parallelize and distribute data-intensive pipelines
- Model data with dimensionality reduction, clustering, regression, and classification algorithms
The Unix philosophy of simple tools, each doing one job well, then cleverly piped together, is embodied by the command line. Jeroen expertly discusses how to bring that philosophy into your work in data science, illustrating how the command line is not only the world of file input/output, but also the world of data manipulation, exploration, and even modeling.
—Chris H. Wiggins
Associate Professor in the Department of Applied Physics and Applied Mathematics at Columbia University and Chief Data Scientist at The New York Times
This book explains how to integrate common data science tasks into a coherent workflow. It’s not just about tactics for breaking down problems, it’s also about strategies for assembling the pieces of the solution.
—John D. Cook
Consultant in applied mathematics, statistics, and technical computing
If you find this content useful, please consider supporting the work by either:
- Buying the book on Amazon or bol.com
- Writing a review on Amazon or Goodreads
- Starring the Github repository or Docker image
This work is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License.
Did you know that the author gives in-company training about this topic and other topics such as R and Python? If you and your colleagues would like to learn from Jeroen in person, please contact Data Science Workshops B.V. for more information.