《Scikit-learn机器学习详解 上》潘风文,潘启儒著|(epub+azw3+mobi+pdf)电子书下载

图书名称:《Scikit-learn机器学习详解 上》

【作 者】潘风文,潘启儒著
【丛书名】人工智能开发丛书
【页 数】 276
【出版社】 北京:化学工业出版社 , 2021.01
【ISBN号】978-7-122-37849-1
【分 类】机器学习-指南-英文
【参考文献】 潘风文,潘启儒著. Scikit-learn机器学习详解 上. 北京:化学工业出版社, 2021.01.

图书封面:

图书目录:

《Scikit-learn机器学习详解 上》内容提要:

本书主要内容包括机器学习介绍,NumPy、Pandas、SciPy库、Matplotlib(可视化)四个基础模块,Scikit-learn算法、模型、拟合、过拟合、欠拟合、模型性能度量指标、数据标准化、非线性转换、离散化,以及特征抽取和降维的各种方法,包括特征哈希、文本特征抽取、特征聚合等。

《Scikit-learn机器学习详解 上》内容试读

CHAPTER 1

The Machine Learning Landscape

When most people hear"Machine Learning,"they picture a robot:a dependable but-ler or a deadly Terminator,depending on who you ask.But Machine Learning is notjust a futuristic fantasy;it's already here.In fact,it has been around for decades insome specialized applications,such as Optical Character Recognition(OCR).But thefirst ML application that really became mainstream,improving the lives of hundredsof millions of people,took over the world back in the 1990s:the spam filter.It's notexactly a self-aware Skynet,but it does technically qualify as Machine Learning(it hasactually learned so well that you seldom need to flag an email as spam anymore).Itwas followed by hundreds of ML applications that now quietly power hundreds ofproducts and features that you use regularly,from better recommendations to voicesearch.

Where does Machine Learning start and where does it end?What exactly does itmean for a machine to learn something?If I download a copy of Wikipedia,has mycomputer really learned something?Is it suddenly smarter?In this chapter we willstart by clarifying what Machine Learning is and why you may want to use it.

Then,before we set out to explore the Machine Learning continent,we will take alook at the map and learn about the main regions and the most notable landmarks:supervised versus unsupervised learning,online versus batch learning,instance-based versus model-based learning.Then we will look at the workflow of a typical MLproject,discuss the main challenges you may face,and cover how to evaluate andfine-tune a Machine Learning system.

This chapter introduces a lot of fundamental concepts(and jargon)that every datascientist should know by heart.It will be a high-level overview (it's the only chapterwithout much code),all rather simple,but you should make sure everything is crystalclear to you before continuing on to the rest of the book.So grab a coffee and let's getstarted!

If you already know all the Machine Learning basics,you may wantto skip directly to Chapter 2.If you are not sure,try to answer allthe questions listed at the end of the chapter before moving on.

What Is Machine Learning?

Machine Learning is the science (and art)of programming computers so they canlearn from data.

Here is a slightly more general definition:

[Machine Learning is the]field of study that gives computers the ability to learnwithout being explicitly programmed.

-Arthur Samuel,1959

And a more engineering-oriented one:

A computer program is said to learn from experience E with respect to some task Tand some performance measure P,if its performance on T,as measured by P,improves with experience E.

-Tom Mitchell,1997

Your spam filter is a Machine Learning program that,given examples of spam emails(e.g.,flagged by users)and examples of regular (nonspam,also called"ham")emails,can learn to flag spam.The examples that the system uses to learn are called the train-ing set.Each training example is called a training instance(or sample).In this case,thetask T is to flag spam for new emails,the experience E is the training data,and theperformance measure P needs to be defined;for example,you can use the ratio ofcorrectly classified emails.This particular performance measure is called accuracy,and it is often used in classification tasks.

If you just download a copy of Wikipedia,your computer has a lot more data,but it isnot suddenly better at any task.Thus,downloading a copy of Wikipedia is not

Machine Learning.

Why Use Machine Learning?

Consider how you would write a spam filter using traditional programming techni-ques(Figure 1-1):

1.First you would consider what spam typically looks like.You might notice thatsome words or phrases(such as“4U,”“credit card,”“free,”and“amazing")tend tocome up a lot in the subject line.Perhaps you would also notice a few other pat-terns in the sender's name,the email's body,and other parts of the email.

21Chapter 1:The Machine Learning Landscape

2.You would write a detection algorithm for each of the patterns that you noticed,and your program would flag emails as spam if a number of these patterns weredetected.

3.You would test your program and repeat steps 1 and 2 until it was good enoughto launch.

Launch!

Study the

Evaluate

problem

Write rules

Analyze

errors

Figure 1-1.The traditional approach

Since the problem is difficult,your program will likely become a long list of complexrules-pretty hard to maintain.

In contrast,a spam filter based on Machine Learning techniques automatically learnswhich words and phrases are good predictors of spam by detecting unusually fre-quent patterns of words in the spam examples compared to the ham examples(Figure 1-2).The program is much shorter,easier to maintain,and most likely moreaccurate.

What if spammers notice that all their emails containing "4U"are blocked?Theymight start writing "For U"instead.A spam filter using traditional programmingtechniques would need to be updated to flag"For U"emails.If spammers keep work-ing around your spam filter,you will need to keep writing new rules forever.

In contrast,a spam filter based on Machine Learning techniques automatically noti-ces that"For U"has become unusually frequent in spam flagged by users,and it startsflagging them without your intervention(Figure 1-3).

Why Use Machine Learning?3

Launch!

Data

Study the

Train ML

Evaluate

problem

algorithm

solution

Analyze

errors

Figure 1-2.The Machine Learning approach

Update

Launch!

data

Data

Can be automated

Train ML

Evaluate

algorithm

solution

Figure 1-3.Automatically adapting to change

Another area where Machine Learning shines is for problems that either are too com-plex for traditional approaches or have no known algorithm.For example,considerspeech recognition.Say you want to start simple and write a program capable of dis-tinguishing the words "one'”and“two”You might notice that the word“two”startswith a high-pitch sound("T"),so you could hardcode an algorithm that measureshigh-pitch sound intensity and use that to distinguish ones and twos-but obviouslythis technique will not scale to thousands of words spoken by millions of very differ-ent people in noisy environments and in dozens of languages.The best solution (atleast today)is to write an algorithm that learns by itself,given many example record-ings for each word.

Finally,Machine Learning can help humans learn(Figure 1-4).ML algorithms can beinspected to see what they have learned(although for some algorithms this can betricky).For instance,once a spam filter has been trained on enough spam,it caneasily be inspected to reveal the list of words and combinations of words that itbelieves are the best predictors of spam.Sometimes this will reveal unsuspected

Chapter 1:The Machine Learning Landscape

correlations or new trends,and thereby lead to a better understanding of the prob-lem.Applying ML techniques to dig into large amounts of data can help discover pat-terns that were not immediately apparent.This is called data mining.

Study the

Train ML

problem

algorithm

Solution

Inspect the

"Lots"of data

solution

Iterate if needed

Understand theproblem better

Figure 1-4.Machine Learning can help humans learn

To summarize,Machine Learning is great for:

Problems for which existing solutions require a lot of fine-tuning or long lists ofrules:one Machine Learning algorithm can often simplify code and perform bet-ter than the traditional approach.

Complex problems for which using a traditional approach yields no good solu-tion:the best Machine Learning techniques can perhaps find a solution.

Fluctuating environments:a Machine Learning system can adapt to new data.

Getting insights about complex problems and large amounts of data.

Examples of Applications

Let's look at some concrete examples of Machine Learning tasks,along with the tech-niques that can tackle them:

Analyzing images of products on a production line to automatically classify them

This is image classification,typically performed using convolutional neural net-works(CNNs;see Chapter 14).

Detecting tumors in brain scans

This is semantic segmentation,where each pixel in the image is classified(as wewant to determine the exact location and shape of tumors),typically using CNNsas well.

Examples of Applications5

Automatically classifying news articles

This is natural language processing(NLP),and more specifically text classifica-tion,which can be tackled using recurrent neural networks(RNNs),CNNs,or

Transformers(see Chapter 16)

Automatically flagging offensive comments on discussion forums

This is also text classification,using the same NLP tools.

Summarizing long documents automatically

This is a branch of NLP called text summarization,again using the same tools.

Creating a chatbot or a personal assistant

This involves many NLP components,including natural language understanding(NLU)and question-answering modules.

Forecasting your company's revenue next year,based on many performance metrics

This is a regression task (i.e.,predicting values)that may be tackled using anyregression model,such as a Linear Regression or Polynomial Regression model(see Chapter 4),a regression SVM(see Chapter 5),a regression Random Forest(see Chapter 7),or an artificial neural network(see Chapter 10).If you want totake into account sequences of past performance metrics,you may want to useRNNs,CNNs,or Transformers(see Chapters 15 and 16).

Making your app react to voice commands

This is speech recognition,which requires processing audio samples:since theyare long and complex sequences,they are typically processed using RNNs,CNNs,or Transformers(see Chapters 15 and 16).

Detecting credit card fraud

This is anomaly detection(see Chapter 9).

Segmenting clients based on their purchases so that you can design a different marketingstrategy for each segment

This is clustering(see Chapter 9).

Representing a complex,high-dimensional dataset in a clear and insightful diagram

This is data visualization,often involving dimensionality reduction techniques(see Chapter 8)

Recommending a product that a client may be interested in,based on past purchases

This is a recommender system.One approach is to feed past purchases (andother information about the client)to an artificial neural network (see Chap-ter 10),and get it to output the most likely next purchase.This neural net wouldtypically be trained on past sequences of purchases across all clients.

6Chapter 1:The Machine Learning Landscape

···试读结束···

阅读剩余
THE END