TSSFL Stack Live ShowCase During Sabasaba International Trade Fair In Dar Es Salaam, Tanzania

Post general but valuable information and news and information: tech and/or education-related news and announcements. Posts under this forum must strictly adhere to the Forum Rules. If you are not sure about what to post, please ask for help from forums administration. Any violation will not be tolerated!
Post Reply
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5372
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#1

TSSFL ODF Live ShowCase During Sabasaba International Trade Fair In Dar Es Salaam, Tanzania, From 28th June - 13th July 2021

Visualizing the Global 2019 GDP Per Capita, Life Expectancy, and other Social Factors Dataset

Below we show how to use the data science tools, in particular, Python programming language to visualize the relationship between economic and social factors. We use this dataset which features GDP per capita, social support, healthy life expectancy, freedom to make choices, generosity, and so on, all over the world. Run the code below:



  1. import numpy as np
  2. import pandas as pd
  3. import seaborn as sns
  4. import matplotlib.pyplot as plt
  5.  
  6. #We use the dataset called "2019.csv" found at https://github.com/fati8999-tech/Data-visualization-with-Python-Using-Seaborn-and-Plotly_-GDP-per-Capita-Life-Expectency-Dataset/blob/master/2019.csv
  7. #Pull the "raw" GitHub content
  8. df = pd.read_csv('https://raw.githubusercontent.com/fati8999-tech/Data-visualization-with-Python-Using-Seaborn-and-Plotly_-GDP-per-Capita-Life-Expectency-Dataset/master/2019.csv')
  9. print(df.head(5))
  10.  
  11. #Configure plotting parameters
  12. import seaborn as sns
  13. #plt.style.use('ggplot')
  14. sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
  15. plt.rc('axes', titlesize=18)     # fontsize of the axes title
  16. plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
  17. plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
  18. plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
  19. plt.rc('legend', fontsize=13)    # legend fontsize
  20. plt.rc('font', size=13)
  21.  
  22. colors1 = sns.color_palette('pastel')
  23. colors2 = sns.color_palette('deep')
  24. #colors = sns.color_palette("Set2")
  25. #Let's plot a distribution of a single column in a dataframe (GDP per capita)
  26. #using sns.distplot(dataofsinglecolumn)
  27.  
  28. sns.distplot(df['GDP per capita'], bins=10, color="magenta") #Use 10 bins
  29. plt.show()
  30. plt.clf()
  31.  
  32. #Let's use 25 bins and remove KDE
  33. sns.distplot(df['GDP per capita'], kde = False , bins = 25, color="magenta")
  34. plt.show()
  35.  
  36. #Jointplot
  37. #Let's visualize the relationship between two variables using scatter and histogram plots
  38.  
  39. sns.jointplot(x=df['GDP per capita'], y= df['Healthy life expectancy'],data=df, color="green") #Two ditribution x and y
  40. plt.show()
  41. plt.clf()
  42.  
  43. #Let's draw scatter plot using function kind = "", and bin the data into
  44. #hexagons with histogram in the margins
  45. sns.jointplot(x=df['GDP per capita'], y= df['Healthy life expectancy'],data=df,kind='reg', color=colors2[6])
  46. plt.show()
  47. plt.clf()
  48.  
  49. #
  50. sns.jointplot(x=df['GDP per capita'], y= df['Healthy life expectancy'],data=df,kind='resid', color=colors1[5])
  51. plt.show()
  52. plt.clf()
  53.  
  54. sns.jointplot(x=df['GDP per capita'], y= df['Healthy life expectancy'],data=df,kind='kde', color="purple")
  55. plt.show()
  56. plt.clf()
  57.  
  58. sns.jointplot(x=df['GDP per capita'], y= df['Healthy life expectancy'],data=df,kind='hist', color="darkblue")
  59. plt.show()
  60. plt.clf()
  61.  
  62. sns.jointplot(x=df['GDP per capita'], y= df['Healthy life expectancy'],data=df,kind='hex', color="red")
  63. plt.show()
  64. plt.clf()
  65.  
  66. #Results show that GDP per capita and Healthy life expectancy are positively linearly correlated
  67.  
  68. df_sorted = df.sort_values('GDP per capita',ascending=False)
  69. #Let's plot categorical GDP per capita for top ten countries
  70. plt.figure(figsize=(10, 6), tight_layout=True)
  71. sns.barplot(x=df_sorted['GDP per capita'],y=df_sorted['Country or region'].head(10),data=df_sorted, color="darkcyan")
  72. plt.xticks(rotation=90)
  73. plt.title("Top 10 Countries with Highest GDP per Capita")
  74. for i, v in enumerate(df_sorted['GDP per capita'].head(10)):
  75.     plt.text(v+0.01, i, str(round(v, 4)), color='steelblue', va="center")
  76.     plt.text(v+0.3, i, str(i+1), color='black', va="center")
  77.  
  78. #plt.subplots_adjust(left=0.3)    
  79. textstr = 'Created at \nwww.tssfl.com'
  80. #plt.text(0.02, 0.5, textstr, fontsize=14, transform=plt.gcf().transFigure)
  81. plt.gcf().text(0.02, 0.9, textstr, fontsize=14, color='green') # (0,0) is bottom left, (1,1) is top right
  82. plt.show()
  83. plt.clf()
  84.  
  85. df_sorted = df.sort_values('GDP per capita',ascending=False)
  86. #Let's plot categorical GDP per capital for top ten countries
  87. plt.figure(figsize=(8,6), tight_layout=True)
  88. sns.barplot(x=df_sorted['Country or region'].head(10), y=df_sorted['GDP per capita'],data=df_sorted, color="darkcyan")
  89. plt.xticks(rotation=90)
  90. plt.title("Top 10 Countries with Highest GDP per Capita")
  91. xlocs, xlabs = plt.xticks()
  92. for i, v in enumerate(df_sorted['GDP per capita'].head(10)):
  93.     plt.text(xlocs[i] - 0.25, v + 0.05, str(v), color='steelblue', va="center")
  94. plt.gcf().text(0.02, 0.1, textstr, fontsize=14, color='green')
  95. plt.show()
  96. plt.clf()
  97.  
  98. #Let's plot categorical GDP per capital for top ten countries
  99. df_sorted = df.sort_values('GDP per capita',ascending=True)
  100. plt.figure(figsize=(8,8), tight_layout=True)
  101. sns.barplot(x=df_sorted['GDP per capita'],y=df_sorted['Country or region'].head(10),data=df_sorted, color="darkmagenta")
  102. plt.xticks(rotation=90)
  103. plt.title("Countries with Lowest GDP per Capita")
  104. for i, v in enumerate(df_sorted['GDP per capita'].head(10)):
  105.     plt.text(v+0.01, i, str(round(v, 4)), color='teal', va="center")
  106. plt.gcf().text(0.7, 0.85, textstr, fontsize=14, color='green')
  107. plt.show()
  108. plt.clf()
  109.  
  110. df_sorted = df.sort_values('GDP per capita',ascending=True)
  111. #Let's plot categorical GDP per capital for top ten countries
  112. plt.figure(figsize=(8,8), tight_layout=True)
  113. sns.barplot(x=df_sorted['Country or region'].head(10), y=df_sorted['GDP per capita'],data=df_sorted, color="darkmagenta")
  114. plt.xticks(rotation=90)
  115. plt.title("Countries with Lowest GDP per Capita")
  116. xlocs, xlabs = plt.xticks()
  117. for i, v in enumerate(df_sorted['GDP per capita'].head(10)):
  118.     plt.text(xlocs[i] - 0.25, v + 0.01, str(v), color='teal', va="center")
  119. plt.gcf().text(0.2, 0.85, textstr, fontsize=14, color='green')
  120. plt.show()
  121. plt.clf()
  122.  
  123. df_sorted = df.sort_values('GDP per capita',ascending=True)
  124. #Let's plot categorical GDP per capital for top ten countries
  125. plt.figure(figsize=(12,40), tight_layout=True)
  126. sns.barplot(x=df_sorted['GDP per capita'],y=df_sorted['Country or region'],data=df_sorted, color="lightblue")
  127. plt.xticks(rotation=90)
  128. plt.title("GDP per Capita")
  129. for i, v in enumerate(df_sorted['GDP per capita']):
  130.     plt.text(v+0.01, i, str(round(v, 4)), color='teal', va="center")
  131.     plt.text(v+0.15, i, str(157-(i+1)), color='black', va="center")
  132. plt.gcf().text(0.55, 0.96, textstr, fontsize=14, color='green')
  133. plt.show()
  134. plt.clf()
  135.  
  136. df_sorted = df.sort_values('GDP per capita',ascending=False)
  137. #Let's plot categorical GDP per capital for top ten countries
  138. plt.figure(figsize=(12,40), tight_layout=True)
  139. sns.barplot(x=df_sorted['GDP per capita'],y=df_sorted['Country or region'],data=df_sorted, color="lightblue")
  140. plt.xticks(rotation=90)
  141. plt.title("GDP per Capita")
  142. for i, v in enumerate(df_sorted['GDP per capita']):
  143.     plt.text(v+0.01, i, str(round(v, 4)), color='teal', va="center")
  144.     plt.text(v+0.15, i, str(i+1), color='black', va="center")
  145. plt.gcf().text(0.02, 0.99, textstr, fontsize=14, color='green')
  146. plt.show()
  147. plt.clf()
  148. #End
  149.  
  150. #Let's plot categorical GDP per capital for top ten countries
  151. plt.figure(figsize=(8,5), tight_layout=True)
  152. sns.barplot(x=df['Country or region'].tail(10),y=df['GDP per capita'],data=df, color="olive")
  153. plt.xticks(rotation=90)
  154. plt.show()
  155. plt.clf()
  156.  
  157. #Matrix plot visualizing correlation btn the data selected
  158. data_select = df[['GDP per capita','Social support','Healthy life expectancy','Perceptions of corruption']]
  159. print("Correlation between Data:")
  160. print(data_select.corr())
  161.  
  162. #Visualize
  163. #Change color as you want https://matplotlib.org/tutorials/colors/colormaps.html
  164. plt.figure(figsize=(8,6), tight_layout=True)
  165. sns.heatmap(data_select.corr(), cmap='coolwarm')
  166. plt.title("Matrix Plot")
  167. plt.show()
  168. plt.clf()
  169.  
  170. #Let's get various relationships for the entire dataset
  171. #Get the distribution of a single variable by hist and of two variables by scatter
  172. plt.style.use('ggplot')
  173. sns.pairplot(df)
  174. plt.show()
  175. plt.clf()

0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5372
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#2

Here is the output:


Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image
Attachments
pic1.png
(25.79 KiB) Not downloaded yet
pic1.png
(25.79 KiB) Not downloaded yet
pic2.png
(14.38 KiB) Not downloaded yet
pic2.png
(14.38 KiB) Not downloaded yet
pic3.png
(30.29 KiB) Not downloaded yet
pic3.png
(30.29 KiB) Not downloaded yet
pic4.png
(54.79 KiB) Not downloaded yet
pic4.png
(54.79 KiB) Not downloaded yet
pic5.png
(34.24 KiB) Not downloaded yet
pic5.png
(34.24 KiB) Not downloaded yet
pic6.png
(84.86 KiB) Not downloaded yet
pic6.png
(84.86 KiB) Not downloaded yet
pic7.png
(19.92 KiB) Not downloaded yet
pic7.png
(19.92 KiB) Not downloaded yet
pic8.png
(43.78 KiB) Not downloaded yet
pic8.png
(43.78 KiB) Not downloaded yet
pic9.png
(32 KiB) Not downloaded yet
pic9.png
(32 KiB) Not downloaded yet
pic10.png
(33.61 KiB) Not downloaded yet
pic10.png
(33.61 KiB) Not downloaded yet
pic11.png
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5372
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#3

TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5372
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#4

Here is the related analysis

  1. import pandas as pd
  2. import matplotlib.pyplot as plt
  3.  
  4. data = pd.read_csv('https://raw.githubusercontent.com/mpicbg-scicomp/dlbc17-python-intro/master/data/gapminder_gdp_oceania.csv', index_col='country')
  5.  
  6. # Extract year from last 4 characters of each column name
  7. # The current column names are structured as 'gdpPercap_(year)',
  8. # so we want to keep the (year) part only for clarity when plotting GDP vs. years
  9. # To do this we use strip(), which removes from the string the characters stated in the argument
  10. # This method works on strings, so we call str before strip()
  11.  
  12. years = data.columns.str.strip('gdpPercap_')
  13.  
  14. # Convert year values to integers, saving results back to dataframe
  15.  
  16. data.columns = years.astype(int)
  17.  
  18. data.loc['Australia'].plot()
  19. plt.show()
  20. plt.clf()
  21.  
  22. #Select and transform data, then plot it
  23. data.T.plot()
  24. plt.ylabel('GDP per capita')
  25. plt.show()
  26.  
  27. #Use ggplot
  28.  
  29. plt.style.use('ggplot')
  30. data.T.plot(kind='bar')
  31. plt.ylabel('GDP per capita')
  32. plt.show()
  33. plt.clf()
  34.  
  35. #Use Matplotlib
  36. years = data.columns
  37. gdp_australia = data.loc['Australia']
  38. plt.plot(years, gdp_australia, 'g--')
  39. plt.show()
  40.  
  41. #Plot several datasets
  42. # Select two countries' worth of data.
  43. gdp_australia = data.loc['Australia']
  44. gdp_nz = data.loc['New Zealand']
  45.  
  46. # Plot with differently-colored markers.
  47. plt.plot(years, gdp_australia, 'b-', label='Australia')
  48. plt.plot(years, gdp_nz, 'g-', label='New Zealand')
  49.  
  50. # Create legend.
  51. plt.legend(loc='upper left')
  52. plt.xlabel('Year')
  53. plt.ylabel('GDP per capita ($)')
  54. plt.show()
  55. plt.clf()
  56.  
  57. #Plot a scatter plot correlating the GDP of Australia and New Zealand
  58. plt.scatter(gdp_australia, gdp_nz)
  59. plt.show()
  60. plt.clf()
  61. data.T.plot.scatter(x = 'Australia', y = 'New Zealand')
  62. plt.show()
  63. plt.clf()
  64.  
  65. #Plot the minimum GDP per capita over time for all the countries in Europe
  66. #plot the maximum GDP per capita over time for Europe
  67. data_europe = pd.read_csv('https://raw.githubusercontent.com/alistairwalsh/2016-07-13-SUT/master/data/gapminder_gdp_europe.csv', index_col='country')
  68. data_europe.min().plot(label='min')
  69. data_europe.max().plot(label='max')
  70. plt.title("Min and Max GDP per capita: European countries")
  71. plt.legend(loc='best')
  72. plt.xticks(rotation=90)
  73. plt.show()
  74. plt.clf()
  75.  
  76. #Scatter plot showing the relationship between the minimum and maximum GDP per capita among the countries in Asia
  77. #for each year in the data set
  78. data_asia = pd.read_csv('https://raw.githubusercontent.com/vanzaj/2016-06-11-ntu/gh-pages/data/gapminder_gdp_asia.csv', index_col='country')
  79. data_asia.describe().T.plot(kind='scatter', x='min', y='max')
  80. plt.title("Min and Max GDP per capita: Asia countries")
  81. plt.show()
  82. plt.clf()
  83. #No particular correlations can be seen between the minimum and maximum gdp values year on year
  84.  
  85. #The variability in the maximum is much higher than that of the minimum
  86. data_asia.max().plot()
  87. print(data_asia.idxmax())
  88. print(data_asia.idxmin())
  89. plt.title("Variability in max and min GDP per capita: Asia")
  90. plt.xticks(rotation=90)
  91. plt.show()
  92. #Myanmar consistently has the lowest gdp, the highest gdb nation has varied more notably
  93. plt.clf()
  94.  
  95. #The correlation between GDP and life expectancy for 2007, normalizing marker size by population:
  96. data_all = pd.read_csv('https://raw.githubusercontent.com/mpicbg-scicomp/dlbc17-python-intro/master/data/gapminder_all.csv', index_col='country')
  97. data_all.plot(kind='scatter', x='gdpPercap_2007', y='lifeExp_2007',
  98.               s=data_all['pop_2007']/1e6)
  99. plt.title("Correlation btn GDP and life expectancy for 2007")
  100. plt.show()


Ref: [1]
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5372
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#5

We can do similar analysis with R

  1. #googlesheets4 auth
  2. #library(googlesheets4)
  3. #read_sheet("https://docs.google.com/spreadsheets/d/1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY/edit#gid=0")
  4. #read_sheet("https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit#gid=780868077")
  5. #read_sheet("1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY")
  6.  
  7. library(googlesheets4)
  8. #gs4_deauth()
  9. gs4_deauth()
  10. #Imagine this is the URL or ID of a Sheet readable by anyone (with a link)
  11. ss <- "https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit#gid=780868077"
  12. dat <- read_sheet(ss)
  13.  
  14. #By sheet ID
  15. ss2 <- "1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY"
  16. dat2 <- read_sheet(ss2)


See documentation: Ref1, Ref2

See basic usage:

https://cran.r-project.org/web/packages ... usage.html
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
Post Reply
  • Similar Topics
    Replies
    Views
    Last post

Return to “News Board”

  • Information
  • Who is online

    Users browsing this forum: No registered users and 1 guest