Among 11 application categories listed in Table 3, social network, e-commerce, advertising and search together contribute more than 80% of total HTTP traffic. The average flow duration of email in the week is 508 second, which is shorter than any type of applications. 10% of mail flow last more than 5.4 second, and 80% of mail flow last less than 2.4 second. E-commerce has the longest average duration (2453 second) for each user, implying that users like to spend more time on e-commerce.
For the purpose of understanding the browsing interests among users in Mobile Internet, we study the similarities and differences between web users. To this end, we identify the “interest cluster” (a group of user who share the similar web browsing interest) by using the clustering method to classify users with similar behavior.
1) Interest Cluster
We see that 50% of users visit more than 5 different application categories per day, which suggests that most users have diverse interests. Yet users spend different amount of time in different applications, and generate different numbers of flows. In other words, users usually have different preference in terms of the applications they use. Identifying the application that users have most been interested in could help us predict user behavior. In this section, we classify users into different groups in terms of application preference (e.g. interest clusters) and study user behavior in these interest clusters.
Here we co-cluster the users and application categories using divisive hierarchical clustering [25] to investigate whether there exist distinct application usage patterns among mobile users. Divisive hierarchical clustering is an improvement of Spectral Graph-k-Part [28].
We group users using divisive hierarchical clustering in every hour to investigate the applications usage patterns with fine time granularity. The questions that we want to address are the following: (a) If user tends to stick to one category of applications all the time? (b) Does the browsing behavior change with time?
We use “entropy” to describe the diversity of user browsing behavior, which is defined as below: