Data Science at the Command Line, 2e
Welcome
Preface
What to Expect from This Book
Changes for the Second Edition
How to Read This Book
Who This Book Is For
Acknowledgments
Dedication
About the Author
1
Introduction
1.1
Overview
1.2
Data Science is OSEMN
1.2.1
Obtaining Data
1.2.2
Scrubbing Data
1.2.3
Exploring Data
1.2.4
Modeling Data
1.2.5
Interpreting Data
1.3
Intermezzo Chapters
1.4
What is the Command Line?
1.5
Why Data Science at the Command Line?
1.5.1
The Command Line is Agile
1.5.2
The Command Line is Augmenting
1.5.3
The Command Line is Scalable
1.5.4
The Command Line is Extensible
1.5.5
The Command Line is Ubiquitous
1.6
A Real-world Use Case
1.7
Further Reading
2
Getting Started
2.1
Overview
2.2
Installing the Docker Image
2.3
Essential GNU/Linux Concepts
2.3.1
The Environment
2.3.2
Executing a Command-line Tool
2.3.3
Five Types of Command-line Tools
2.3.4
Combining Command-line Tools
2.3.5
Redirecting Input and Output
2.3.6
Working With Files
2.3.7
Help!
2.4
Further Reading
3
Obtaining Data
4
Reusable Command-line Tools
5
Scrubbing Data
6
Managing Workflows with Make
6.1
Overview
6.2
Introducing Make
6.3
A Glorified Task Runner
6.4
Building Targets
6.5
Adding Dependencies between Targets
6.6
Discussion
7
Exploring Data
8
Parallel Processing
9
Modeling Data
10
Polyglot Data Science
11
Conclusion
11.1
Let’s Recap
11.2
Three Pieces of Advice
11.2.1
Be Patient
11.2.2
Be Creative
11.2.3
Be Practical
11.3
Where To Go From Here?
11.3.1
APIs
11.3.2
Shell Programming
11.3.3
Python, R, and SQL
11.3.4
Interpreting Data
11.4
Getting in Touch
Read first edition instead.
Data Science at the Command Line, 2e
Chapter 3
Obtaining Data