Data from social media is obtained from the API (Application Programming Interface) provided by social media sites. Social media contains different types of data – user information, connections between users, generated by users’ content and etc. Each data type usually accessed by separate API and each API impose restrictions to the amount of accesses. This lead to the impossibility to collect in reasonable time all data stored in social media. But even with these restrictions amounts of data are big. Also data types can be structured (e.g. user profiles, links between users) and unstructured (e.g. user’s interests, posts or pictures). To efficiently work with big and unstructured data we use Hadoop framework which implements MapReduce model for distributed computations and other associated with it technologies. Another important characteristic of the data from social media is its «sparseness» which lies in the fact that only small amount of data is relevant to the solving problem. This means that original big data mined from social media should be filtered, mapped or aggregated to some small dataset that is actually used in further analysis