Living in an era of anywhere anytime
connectedness for the great mass, safety and security on the web
presents enormous challenges. There is a great need for better
content detection systems that can more accurately identify
excessively offensive and harmful websites. Web classification
models in the early days are limited by the methods and data
available. Today advanced developments in computing
methodologies and technology have brought us many new and
better means for text content analysis, for example new methods
for topic extraction, topic modeling and sentiment analysis. Our
recent studies suggested the promising potential of combing topic
analysis and sentiment analysis in web content classification. This
paper further explores new classification models for better
classification performance, especially to enhance precision and
reduce false positives, by incorporation of semantics in
developing classification models and by examination and
handling of the issues with the dataset reliability, class imbalance
and covariate shift.