This paper addresses a combined problem of human resource planning (HRP) and
production-inventory control for a high-tech industry, wherein the human resource plays
a critical role. The main characteristics of this resource are the levels of ‘‘knowledge’’ and
the learning process. The learning occurs during the production process in which a worker
can promote to the upper knowledge level. Workers in upper levels have more productivity
in the production. The objective is to maximize the expected profit by deciding on the optimal
numbers of workers in various knowledge levels to fulfill both production and training
requirement. As taking an action affects next periods’ decisions, the main problem is to find
the optimal hiring policy of non-skilled workers in long-time horizon. Thus, we develop a
reinforcement learning (RL) model to obtain the optimal decision for hiring workers under
the demand uncertainty. The proposed interval-based policy of our RL model, in which for
each state there are multiple choices, makes it more flexible. We also embed some managerial
issues such as layoff and overtime-working hours into the model. To evaluate the proposed
methodology, stochastic dynamic programming (SDP) and a conservative method
implemented in a real case study are used. We study all these methods in terms of four
criteria: average obtained profit, average obtained cost, the number of new-hired workers,
and the standard deviation of hiring policies. The numerical results confirm that our
developed method end up with satisfactory results compared to two other approaches