封面
版权信息
Credits
Preface
Part 1. Module 1
Chapter 1. Getting Started with Predictive Modelling
Introducing predictive modelling
Applications and examples of predictive modelling
Python and its packages – download and installation
Python and its packages for predictive modelling
IDEs for Python
Summary
Chapter 2. Data Cleaning
Reading the data – variations and examples
Various methods of importing data in Python
The read_csv method
Use cases of the read_csv method
Case 2 – reading a dataset using the open method of Python
Case 3 – reading data from a URL
Case 4 – miscellaneous cases
Basics – summary dimensions and structure
Handling missing values
Creating dummy variables
Visualizing a dataset by basic plotting
Summary
Chapter 3. Data Wrangling
Subsetting a dataset
Generating random numbers and their usage
Grouping the data – aggregation filtering and transformation
Random sampling – splitting a dataset in training and testing datasets
Concatenating and appending data
Merging/joining datasets
Summary
Chapter 4. Statistical Concepts for Predictive Modelling
Random sampling and the central limit theorem
Hypothesis testing
Chi-square tests
Correlation
Summary
Chapter 5. Linear Regression with Python
Understanding the maths behind linear regression
Making sense of result parameters
Implementing linear regression with Python
Model validation
Handling other issues in linear regression
Summary
Chapter 6. Logistic Regression with Python
Linear regression versus logistic regression
Understanding the math behind logistic regression
Implementing logistic regression with Python
Model validation and evaluation
Model validation
Summary
Chapter 7. Clustering with Python
Introduction to clustering – what why and how?
Mathematics behind clustering
Implementing clustering using Python
Fine-tuning the clustering
Summary
Chapter 8. Trees and Random Forests with Python
Introducing decision trees
Understanding the mathematics behind decision trees
Implementing a decision tree with scikit-learn
Understanding and implementing regression trees
Understanding and implementing random forests
Summary
Chapter 9. Best Practices for Predictive Modelling
Best practices for coding
Best practices for data handling
Best practices for algorithms
Best practices for statistics
Best practices for business contexts
Summary
Appendix A. A List of Links
Part 2. Module 2
Chapter 1. From Data to Decisions – Getting Started with Analytic Applications
Designing an advanced analytic solution
Case study: sentiment analysis of social media feeds
Case study: targeted e-mail campaigns
Summary
Chapter 2. Exploratory Data Analysis and Visualization in Python
Exploring categorical and numerical data in IPython
Time series analysis
Working with geospatial data
Introduction to PySpark
Summary
Chapter 3. Finding Patterns in the Noise – Clustering and Unsupervised Learning
Similarity and distance metrics
Affinity propagation – automatically choosing cluster numbers
k-medoids
Agglomerative clustering
Streaming clustering in Spark
Summary
Chapter 4. Connecting the Dots with Models – Regression Methods
Linear regression
Tree methods
Scaling out with PySpark – predicting year of song release
Summary
Chapter 5. Putting Data in its Place – Classification Methods and Analysis
Logistic regression
Fitting the model
Evaluating classification models
Separating Nonlinear boundaries with Support vector machines
Comparing classification methods
Case study: fitting classifier models in pyspark
Summary
Chapter 6. Words and Pixels – Working with Unstructured Data
Working with textual data
Principal component analysis
Images
Case Study: Training a Recommender System in PySpark
Summary
Chapter 7. Learning from the Bottom Up – Deep Networks and Unsupervised Features
Learning patterns with neural networks
The TensorFlow library and digit recognition
Summary
Chapter 8. Sharing Models with Prediction Services
The architecture of a prediction service
Clients and making requests
Server – the web traffic controller
Persisting information with database systems
Case study – logistic regression service
Summary
Chapter 9. Reporting and Testing – Iterating on Analytic Systems
Checking the health of models with diagnostics
Iterating on models through A/B testing
Guidelines for communication
Summary
Bibliography
Index
更新时间:2021-07-02 20:09:52