Thursday, 30 March 2017

Introduction of python libraries for machine learning algorithm implementation

We will use Python programming language for implementation of machine learning algorithms. First of all i want to give you introduction of python programming language.

Introduction to Python programming language:
          Python is a simple, dynamic, general purpose high-level programming language intended to be quick (to learn, use and understand) and has very straight froward syntax.
          Python does not need compilation to binary. You just run the program directly from the source code. Internally, python converts the source code into an intermediate form called bytecodes and then translate this into the native language of your computer and then runs it. All this, makes using python much easier.
          Python supports procedure-oriented programming as well as object-oriented programming. In procedure oriented languages, the program is built around procedures or functions which are nothing but reusable peaces of programs. In object-oriented languages, the program is built around objects which combine data and functionality.

Introduction to python libraries: 
          Python is a great general-purpose programming language on its own, but with the help of a few popular libraries (numpy, pandas, matplotlib, etc) it becomes a powerful environment for scientific computing.

Here i am going to give you list of  python libraries which we will be using in implementation of machine learning algorithm.

1) NumPy: NumPy is the fundamental package for scientific computing with Python. NumPy adds support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays.

2)SciPy: SciPy is an open source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.

3)Scikit-image: scikit-image is an open source image processing library for the Python programming language. It includes algorithms for segmentation, geometric transformations, color space manipulation, analysis, filtering, morphology, feature detection, and  more. It is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

4)Scikit-learn: scikit-learn is a free machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. 

5)Pandas: pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
        
6)Matplotlib: Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. You can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code.

How to install above libraries?

I am using python 3. version. You can download latest version of python from python official site from here  https://www.python.org/  

For Windows Users:
In command prompt type the below commands for installation of libraries.

1)NumPy: pip install numpy
2)SciPy: pip install scipy
3)Scikit-image: pip install scikit-image
4)Scikit-learn: pip install scikit-learn
5)Pandas: pip install pandas
6)Matplotlib: pip install matplotlib

Windows user can also go to http://www.lfd.uci.edu/~gohlke/pythonlibs/ this site to download the above packages.

For Ubuntu and other open source users:
 In command prompt type the below commands for installation of libraries.

1)NumPy: sudo pip install numpy
2)SciPy: sudo pip install scipy
3)Scikit-image: sudo pip install scikit-image
4)Scikit-learn: sudo pip install scikit-learn
5)Pandas: sudo pip install pandas
6)Matplotlib: sudo pip install matplotlib

In next post we will implement Linear regression algorithm. For implementing machine learning algorithm you need to download above libraries.

Friday, 17 March 2017

Steps in developing a machine learning application

1) Collect data: You could collect the samples by scraping a website and extracting data, or you could get information from an RSS feed or an API. To save some time and effort, you could use publicly available data.

2) Prepare the input data: Once you have this data, you need to make sure it’s in a use-able format. You  may  need  to  do  some  algorithm-specific  formatting  here.

3) Analyze  the  input  data: This  is  looking  at  the  data  from  the  previous  task.  This could be as simple as looking at the data you’ve parsed in a text editor to make sure steps 1 and 2 are actually working and you don’t have a bunch of empty values.

4) Train  the  algorithm: This  is  where  the  machine  learning  takes  place.  This  step and the next step are where the “core” algorithms lie, depending on the algorithm.  You  feed  the algorithm  good  clean  data  from  the  first  two  steps  and extract knowledge or information. This knowledge you often store in a format that’s readily use-able by a machine for the next two steps.

5) Test the algorithm: This is where the information learned in the previous step is put to use. When you’re evaluating an algorithm, you’ll test it to see how well it does. In the case of supervised learning, you have some known values you can use  to  evaluate  the  algorithm.
 
6) Use it: Here you make a real program to do some task, and once again you see if all the previous steps worked as you expected. You might encounter some new data and have to revisit steps 1–5.

Steps required for selecting the right machine learning algorithm.

How to choose the right algorithm

1) First you need to consider your goal. What are you trying to get out of this? What data you have or can you collect.

2) If  you’re  trying  to  predict  or  forecast  a  target  value,  then  you  need  to  look  into supervised learning.

3) If not, then unsupervised learning is the place you want to be.

4) If you’ve chosen supervised learning, what’s your target value? Is it a discrete value like Yes/No, 1/2/3, A/B/C, or Red/Yellow/Black?  If so, then you want to look into classification. If the target value can take on a number of values, say any value from 0.00 to 100.00, or -999 to 999, or +infinty to -infinty, then you need to look into regression.

5) If you’re not trying to predict a target value, then you need to look into unsupervised  learning.  Are  you  trying  to  fit  your  data  into  some  discrete  groups?  If  so  and that’s all you need, you should look into clustering.

6) Do you need to have some numerical estimate of how strong the fit is into each group? If you answer yes, then you probably should look into a density estimation algorithm.

  The  rules  I’ve  given  here  should  point  you  in  the  right  direction  but  are  not unbreakable laws. You should spend some time getting to know your data, and the more you know about  it,  the  better  you’ll  be  able  to  build  a  successful  application. 

Friday, 3 March 2017

Applications of Machine learning

1) Adaptive website
2) Affective computing
3) Bioinformatics
4) Brain machine interface
5) Classifying DNA sequence
6) Computational anatomy
7) Detecting credit card fraud
8) Economics
9) Game Playing
10) Information retrival
11) Internet fraud detection
12) Marketing
13) Medical diagnosis
14) Natural language processing (NLP)
15) Online Advertising
16) Robot locomotion
17) Search engine
18) Sentiment Analysis
19) Sequence mining
20) Stock market analysis
21) Speech and handwriting recognition
22) Software engineering
23) User behavior analytics

Thursday, 2 March 2017

Introduction to Machine learning

In machine learning, computers apply statistical learning techniques to automatically identify patterns in data. These techniques can be used to make highly accurate predictions. 

Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P improves with experience E
Example:  Learning to play checker
Task T: design a program to learn to play checker
Performance measure P: The percentage of the games won
Experience E: Playing against itself

Types of Machine learning algorithm:
1) Supervised learning: This algorithm consist of a target variable or dependent variable which is to be predicted from a given set of predictors (independent variables). Using these set of variables, we generate a function that map inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data. Examples of Supervised Learning: Regression, Decision Tree, Random Forest, KNN, Logistic Regression etc. 

2) Unsupervised learning:  In this algorithm, we do not have any target or outcome variable to predict / estimate.  It is used for clustering population in different groups, which is widely used for segmenting customers in different groups for specific intervention. Examples of Unsupervised Learning: K-means.

3) Reinforcement learning: Using this algorithm, the machine is trained to make specific decisions. It works this way: the machine is exposed to an environment where it trains itself continually using trial and error. This machine learns from past experience and tries to capture the best possible knowledge to make accurate business decisions. Example of Reinforcement Learning: Markov Decision Process