Through the learning phase, the parameters η,ϕ,ψ,χ,ε can be obtained. For a test image, our goal is to infer the topic T assigned to each region and the category C with the known visual words R on region level, visual words W on patch level, location L and saliency information S of each region