2. Existing Work
A literature review reveals many results on diabetes carried out by different methods and materials of diabetes
problem in India. Many people have developed various prediction models using data mining to predict diabetes.
Combination of classification-regression-genetic-neural network, handles the missing and outlier values in the
diabetic data set, and also they replaced the missing values with domain of the corresponding attribute [1]. The
classical neural network model is used for prediction, on the pre-processed dataset.
In predictive analysis of diabetic treatment using regression based data mining techniques to diabetes data, they
discover patterns using SVM algorithm that identify the best mode of treatment for diabetes across different age [2].
They concluded that drug treatment for patients in the young age group can be delayed whereas; patients in the old
age group should be prescribed drug treatment immediately. Prediction and classification of various type of diabetes using C4.5 classification algorithm was carried out in Pima Indians Diabetes Database [3]. A detailed analysis of the
Pima diabetic data set was carried out efficiently using of Hive and R. In this analysis we can derive some
interesting facts, which can be used to develop the prediction models [4].
The soft computing based prediction model was developed for finding the risks accumulated by the diabetic
patients. They have experimented with real time clinical data using Genetic Algorithm [5]. The obtained results
pertaining to the level of risk which prone to either heart attack or stroke. The novel pre-processing phase with
missing value imputation for both numerical and categorical data. A hybrid combination of Classification and
Regression Trees (CART) and Genetic Algorithms to impute missing continuous values and Self Organizing Feature
Maps (SOFM) to impute categorical values was improved in [6].
Deploying a health information exchange (HIE) repository promote and integrate the data within a single point of
robust data sharing. This sharing of information and electronic communication systems enable access to health
services and also promotes additional care over dual eligible patients. It recognizes which patient is requiring more
care and attention than others. It gives needed data to determine which strategies should be put in place to maximize
positive behavior modification [9].
The predictive analytics works in three areas such as Operations management, Medical management and
biomedicine, and System design and planning. Healthcare predictive analytics system can help one of the issues that
is to address the cost of patients being repeatedly admitted and readmitted to a hospital for chronic diseases which is
similar or multiple. The survey of New England Journal of Medicine tells that one in five patients suffer from
preventable readmissions. Therefore,1% of the population accounts for 20% of all US healthcare expenditures
almost and 25% for over 80% of all expenditures [10].
Various big data technology stack and research over health care combined with efficiency. cost savings, etc.,are
explained in better healthcare [11]. The hadoop usage in health care became more important to process the data and
to adopt the large scale data management activities. The analytics on the combined compute and storage can
promote the cost effectiveness to be gained using hadoop [12].
All the above researchers have been successful in analysing the diabetic data set and developing good prediction
models. In this paper, we use the predictive analysis technique in Hadoop/Map Reduce environment to predict and
classify the type of diabetes. This system provides efficient way to care and cure the patients at low cost with better
outcomes like affordability and availability.