This book is unusual. Most books about data—be they popular
books about big data, open data, or data science, or technical statistical books about how to analyze data—are about the data you have. They are about the data sitting in folders on your computer, in files on your desk, or as records in your notebook. In contrast, this book is about data you don’t have—perhaps data you wish you had, or hoped to have, or thought you had, but
nonetheless data you don’t have. I argue, and illustrate with many
examples, that the missing data are at least as important as the
data you do have. The data you cannot see have the potential to
mislead you, sometimes even with catastrophic consequences,
as we shall see. I show how and why this can happen. But I also
show how it can be avoided—what you should look for to sidestep such disasters. And then, perhaps surprisingly, once we have seen how dark data arise and can cause such problems, I
show how you can use the dark data perspective to flip the conventional way of looking at data analysis on its head: how hiding data can, if you are clever enough, lead to deeper understanding,
better decisions, and better choice of actions.