The commercial value of advertisement on the Web depends on whether users click on the advertisement.
The advertisements click has a significant impact on the Internet industry. It allows Internet companies to identify most relevant ads for each user and improve user experiences. Internet Behavioural targeting (BT) leverages user's online activities to select the ads most relevant to users to display, which is a promising technique to improve the efficiency of online advertising.
There has been a lot of research in Behavioural Targeting. A well-grounded statistical model of BT predicts click-through rate (CTR) of an ad from user behaviour, such as ad clicks and views, page views, search queries etc. The CTR is used in search advertising to rank ads and price clicks. In this paper, we also use the area under the Receiver Operating Characteristics (ROC) curve (AUC) as the evaluation criteria that proposed by track 2, KDD Cup 2012. As we only concern the CTR order of the testing data the rank of the CTR is used instead of the real value. The predicted AUC score should be higher than 0.5 because 1) the AUC value is between 0.0 and 1.0 and 2) the random guessing value of AUC is 0.5.
Receiver Operating Characteristics (ROC) graph is a useful technique for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been increasingly adopted in the machine learning and data mining research communities. In addition to being a generally useful performance graphing method, they have properties that make them especially useful for domains with skewed class distribution and unequal classification error costs. These characteristics have become increasingly important as research continues into the areas of cost-sensitive learning and learning in the presence of unbalanced classes.
AUC is the Area under the ROC curve, in this paper, which is equivalent to the probability that a random pair of positive samples (clicked ad) and a negative one (unclicked ad) is ranked correctly by using the predicted click-through rate. An equivalent way of maximizing the AUC is to divide each instance into (#click) of positive samples and (#impression-#click)negative samples, and then minimize the pair-wise ranking loss of those samples using the predicted click-through rate[3].
In this paper we utilized Multiple Criteria Linear Programming (MCLP) [3] Regression model to predict the Click-Through rate and to compare it with other two well-known regression methods. The datasets [4] used for testing comes from track2 of the KDD Cup 2012. A major challenge is to create efficient features. Feature creation and selection are the most important steps in solving a supervised learning problem. We compared different methods and then chose two of them to create the features.
The paper is structured as follows. Section 2 reviews related work. Section 3 describes our behaviour data. Section 4 introduces MCLP Regression Data Mining Model and Its Algorithm. Section 5 is the experiment. We conclude the paper in Section 6 with future extended work.