To explore Google Play content, we have created PlayDrone,
the first scalable Google Play store crawler and application
analysis framework. PlayDrone uses four key
techniques. First, PlayDrone leverages common hacking
techniques to easily circumvent security measures that Google
uses to prevent indexing Google Play store content. These
techniques include simple dictionary-based attacks for discovering
applications, and decompiling and rebuilding the
Google Play Android client to use insecure communication
protocols to communicate with the Google Play servers to
capture, understand, and reproduce the necessary protocols.
Second, PlayDrone leverages higher-level languages and
frameworks to provide highly concurrent, distributed processing
with modest implementation effort. PlayDrone
is written in Ruby and uses the Sidekiq [31] asynchronous
processing framework and the Redis [33] key-value store.
Its performance scales easily by simply adding servers to
the cluster, enabling PlayDrone to efficiently crawl the
Google Play store on a daily basis even as its content continues
to grow. Third, PlayDrone stores each application’s
metadata and decompiled sources in a Git repository.
This provides a simple versioning system for PlayDrone
to track and manage multiple versions of each application
and analyze how Google Play store content evolves over
time. Finally, PlayDrone leverages the Elasticsearch [19]
distributed real-time search and analytics engine using an
indexing schema based on the Google Play store API to
make it easy to analyze and explore the Google Play store
metadata and content.