To compare public interest, we cannot only use the total number of searches per country because it is also determined by the size of the population and Google penetration; the bigger a population and larger number of people using the Internet, the increased likelihood of higher search results. Therefore, we will create a Logarithmized Search Index for each of our Dependent Variables. From a chosen country, taking the number of searches of keywords in a certain topic (e.g. all keywords in the topic “Edward Snowden”), divided by the total number of Google usersin that country, and multiplying this figure by 100 000 - we will establish the number of searches for the keyword “Snowden” of 100 000 Google users in the chosen country. This formula allows us to estimate public interest regardless of population difference (also regardless of size and Internet penetration) by referring to the number of Google users in a particular country.
Where:
• E(Z24) - average number of searches of information on Snowden, Assange etc. in a period of 24 months.
• Gu - number of Google users in a country
We created index will then be increased by 1 and logarithmized by a natural log transformation. Log linear transformation is intended to normalize distribution of the index which was originally highly positive skewed. However, first we will increase its value by 1 in order to keep countries where the original index values of meaningful zero was present in a particular occasion. Increasing the index by one is needed due to the nature of natural log transformation since does not exist which would lead to the loss of information in the data. In fact, after indexing and summing up the number of searches on a monthly basis, we will have a repeated measure at fixed occasions, where number of searches on each topic was measured in each country on 24 occasions in the same and equal time periods. This means that data organized in such manner are in fact longitudinal data and a longitudinal model should be used to model behaviours in time over the two years when the data were collected.
Longitudinal multilevel model with fixed occasions
Statistical analysis will be conducted with use of SPSS and STATA programmes. In our analysis, we decided to use a longitudinal multilevel model with fixed occasions (Hox, 2002). The dependent variable is a standardized number of searches on a particular topic in each country. Explanatory (independent) variables are averaged GDP per capita, Democracy Index (DemInd) and Interest in Politics (InrPoli). An additional variable (that we introduced), measured at the occasion level, is an identifier of each consecutive measurement time point, starting from one up to 24, assigned to each measurement occasion (t=0,1,2,3,4,5…24) of dependent variable. This leads to a simple two level model where fixed occasions are grouped by countries and which can be expressed as:
where 𝐷𝐸𝑃𝑡𝑗 is varying depending on the topic of measured searches. Interaction and control variables will also be added to the model.