Our data collection involved a random selection of 240 Italian small and medium hotels, choosing for statistical power a confidence level of 95% and a confidence interval of 6%. These hotels were extracted from a population of 2,862 small and medium Italian hotels. Specifically, hotels had fewer than 250 employees, in line with the European Union’s definition of small and medium enterprises and other research studies (Neirotti & Raguseo, 2016). The hotels selected were listed on the AIDA public database (distributed by Bureau Van Dijk), which is the main compendium of financial information of firms in Italy.
Beginning with the population of small and medium-sized hotels listed on AIDA, we randomized the extraction of 240 hotels. Then, we verified whether each of these hotels was present on TripAdvisor. If it was on TripAdvisor, we included the hotel in the final sample. Otherwise, we excluded the hotel from the sample and we proceeded by randomly extracting another hotel from the population, again verifying its presence on TripAdvisor. The random extraction ended when we came up with 240 hotels for which we had both financial data available on AIDA and data on user-generated reviews on TripAdvisor.
Beginning with the financial information available for the 240 hotels selected, we built a panel dataset that spanned the period between 2004 and 2012. The dataset also included gathered data about user-generated reviews (number and rates assigned by users on a 5-level scale) from TripAdvisor on an overall amount of 50,115 reviews. By complementing data from TripAdvisor and data from the AIDA database, the final dataset contained 2,160 observations.