A small questionnaire is prepared where we ask the individual about the
3.Most used social media application(Media app)
4.How many minutes they spend on that particular social media app per day
5.The number of times they visit the app per day
6.The number of posts they make per week on the app.
The data is collected by me in person in the crowded areas. The individuals are chosen randomly. It takes nearly 2-4 minutes to collect one sample. A total 3000 samples are collected and out of which 2979 are considered for our research because of many missing values in the removed 21 samples.
1.Age is a Numerical variable.
2.The variable of gender is a categorical variable (nominal).
3.Social media application names is categorical variable (nominal).
Here we categorize the social media applications as Facebook, Twitter, Snapchat, Pin interest, Instagram and others.
4.Time spent is a Numerical variable.
5.Number of visits is a Numerical variable.
6.Number of posts is again a Numerical variable.
In Table 1, we presented summary of the data corresponding to Age. We can see lot of variation in the data. But most of the sample falls in 18 years to 40years of individuals. The frequency distribution is presented in Table2 and the histogram in Figure1.The data does not look to be normal from the histogram
In Table 4, we presented summary of the data corresponding to Gender(categorical variable). Only 1.2 percent of data comes from other category and 59.3 percent of individuals are females. Figure 3 shows the bar graph distribution for the Gender.
Social media application:
In Table 5, we presented summary of the most used social media application. 36.1 percent of the individuals are using Facebook and only 2.6 percent are using pin interest mostly in a day.
Figure 4 presents the bar graph for the social media application distribution of the collected sample.
Time spent on the favorite application:
In Table 6, we presented summary of the time spent on the favorite application variable (numerical). It is clear from the table that there is so much variation in the time spent on favorite application. Some people spend nearly 1000 minutes per day on the application which looks strange. In Table 7, we have frequency distribution and Figure 5 we have the histogram for the time spent on the favorite application (in minutes). The data does not looks to be normal and in fact it looks like positively skewed. Its skewness value is more than 3.
No of visits:
In Table 9, we presented the summary of the number of times the individual visits their favorite app (per day). Apart from large variation in data, the skewness value is again greater than 3, deviating from normality. Table 10 shows the frequency distribution and Figure 7 presents the histogram of the numerical data which clearly shows, the data does not follow normality.
No of posts:
In Table 12, we presented the summary of the number of posts made by individual on their favorite app (per week). A part from large variation, it has skewness value as 6.37 which means the data deviates from normality. In Table 13 and Figure 9 we have frequency distribution and pictorial representation(histogram).
The first thing I am interested to test is whether the numerical variables are normal. For that I used Kolmogorov-Smirnov and Shapiro-Wilk tests. For both the tests the null hypothesis that the data is normally distributed and the alternative hypothesis is that the data is not normally distributed. The variables of Age, Time-spent on favorite social media app, number of visits per day and number of posts per day are tested for the normality and SPSS output is presented in Tables 3,8,11 and 14.In the outputs we have p-value less than 0.05 which means the variables are not normally distributed. We also presented the Q-Q plots in the Figures 2,6,8,10 and 20 which shows the same result of the tests.
The next thing I am interested to test is whether there is relationship between the most used social media application and the gender. For this we use the chi-square test of independence where the null hypothesis is that there is no relationship between gender and the most used social media app and the alternative hypothesis is that there is a relationship between gender and the most used social media app. The test result from SPSS is presented in Table 16, where the p-value is less than 0.05 which means we have enough evidence to reject the null hypothesis and conclude that there is significant relationship between gender group and most used social media application group. Also Phis and Cramer V test p-value are also less than 0.05 which gives strong result.
A homogeneity of variance test is conducted for the 3 numerical variables among the gender group. Suppose I want to test if the variability of number of visits is same in males and females. Similar hypothesis test is conducted for the other 2 numerical variables and SPSS output is presented in Table 19. Levens test is used and p-values of all the test are less than 0.05 which means they are non-homogenious.
To see the relationship between the numerical variables, scatter plot matrix is presented in Figure 11. The relationships between the numerical variables is not clear from the picture. Going ahead performed a regression analysis on numbers of posts vs number of visits and time spent, the results of which are presented in Table 17. The ANOVA table shows that the repressors are significant and further t-tests for significance of constant and 2 repressors show that the constant term is insignificant(p-value=0.082) and other repressors are significant at the 0.05 level of confidence. The R square value is only 0.162 which means only 16 percent of the variability is explained by the repressors. When age is included as a repressor and regression analysis is performed, we get significant constant along with significant repressors. There is no much change in the R square value (0.166).
Now each numerical variable is tested for difference in the median values among genders using non parametric tests and the output for the result is presented in Table 18. The test result shows that there is significant difference in the median values for all the 3 numeric variables among gender. Also the difference for the medians are tested among favorite social media app and the output is presented. Similar to above result there is significant difference in the median values of all 3 numeric variables among the groups of favorite social media app.
The case summaries are presented in Table 21 for 3 numerical variables with respect to gender group and social media app group. We can see that the other gender than male and female spend a lot of time on social media apps and with high variability. The time spent of all other apps is less than those specified. When studied deeply we can see the other gender spends lot of time, visits more number of times and posts more when compared to male and female. Also the other apps than specified has less number of visits and less number of posts. These results are concluded from the sample collected.
Many interesting facts are obtained from the test results and there is a need for further analysis exploring non parametric and non-homogeneous tests. We can further convert the numerical variables into categorical variables based on present analysis and then test for independence and homogeneity of variables using combinations of 5 variables. They are some outliers in the study for example some samples have 1000 minutes of spending on social media app. So for further analysis I may try trimming concept.
Whatever the information maybe, these days, it is travelling faster than light: social media happening thing now. News, updates, pictures, entertainment and what not, social media gives you everything. It took every other industry like a wave. It can created billionaires. This is one side of the coin.
On the other side, there are the users. Social media applications like Facebook, Instagram, Snapchat and others are the most common ones we find in any smart phone these days. We spend hours on them, feeding on the prioritized information. In a way it has become a habit rather than a pass time. It seems to be an addiction from an angle of perception, but it is that the modes or channels of passing on the information took a great shift onto social media and it is being used as a vehicle which is faster than any other. We cannot deny the argument of addiction, but there is a greater good happening parallel.
At any time of the day, we login, logout, and repeat whenever we are notified by the app. The user’s information is exploited by the social media firms most of the times which makes it look like a parasitic relationship: between the user and the firm, but it is a mutually beneficial one. Social media analysis with reference to statistical data can uncover this answer to us.
In this mini-research project we restrict ourself to study how much time people spend on the social media apps, number of times one opens the app and the number of posts.
Anderson, T. W., & Finn, J. D. (1996). The new statistical analysis of data. New York: Springer.
Mendenhall, W., & Sincich, T. (2003). A second course in statistics: Regression analysis. Upper Saddle River, NJ: Pearson Education.
Pretorius, T. B. (1995). Inferential statistics: Hypothesis testing and decision-making. Cape Town: Percept.
Rohatgi, V. K., & Saleh, A. K. (2015). An introduction to probability theory and statistics. Hoboken, NJ: John Wiley & Sons.