Probability and Statistics for Data Science: Math + R + Data covers "math stat"?distributions, expected value, estimation etc.?but takes the phrase "Data Science" in the title quite seriously:
Real datasets are used extensively.
All data analysis is supported by R coding.
Includes many Data Science applications, such as PCA, mixture distributions, random graph models, Hidden Markov models, linear and logistic regression, and neural networks.
Leads the student to think critically about the "how" and "why" of statistics, and to "see the big picture."
Not "theorem/proof"-oriented, but concepts and models are stated in a mathematically precise manner.
Prerequisites are calculus, some matrix algebra, and some experience in programming.
Norman Matloff is a professor of computer science at the University of California, Davis, and was formerly a statistics professor there. He is on the editorial boards of the Journal of Statistical Software and The R Journal. His book Statistical Regression and Classification: From Linear Models to Machine Learning was the recipient of the Ziegel Award for the best book reviewed in Technometrics in 2017. He is a recipient of his university's Distinguished Teaching Award.