Analyse en composantes principales (ACP) avec R et FactoMineR sur les données relatives à la qualité de la vie dans 18 grandes métropoles américaines (Sources : M.V. JONES and M.J. FLEX, The quality of life in Washington D.C., The Urban Institute, Washington D.C, 1970. in SANDERS 1989, p. 57)

Résumé des principales étapes du code R

1. Import des données

Avec mes fichiers exemples, préférez les paramètres nord-américains (“.” séparateur de décimales, “,” séparateur de miliers).

Salle C210, identifiant => “geographie2” !

### setwd("D:/Users/geographie2/vgodard/ADD/villeUS/R")
### respecter ce cheminement si "geographie2"
getwd()
## [1] "C:/3VG/UP8/enseignement/add/villesUS/R/ACP/2021"
# Remove all objects
rm(list = ls() )

2. Lecture des données

villeUS <- read.table("ta1fm02d.csv",
                      header=TRUE,
                      sep=";",
                      dec=".",### Pour la C210 remplacer le "." par une ","
                      row.names=1,
                      check.names=FALSE,
                      fileEncoding="latin1",
                      stringsAsFactors = TRUE) # si stringsAsFactors = FALSE on perd les modalités

3. Matrice des corrélations et significativité

Installer et exécuter préalablement la librairie “Hmisc”

### Avec la librairie Hmisc
## install.packages("Hmisc") ## si pas déjà installé !
 library(Hmisc)

Script de la matrice des corrélations arrondie à un chiffre après la virgule et matrice des significativités des corrélations (p-value).

rcorr(as.matrix(villeUS[,1:10]), type=c("pearson"))
##       INCO  UNEM  LOWI  HCOS  MENT  INFM  SUIC  POLL  ROBB  TRAF
## INCO  1.00  0.11  0.11 -0.05  0.10 -0.03  0.45 -0.03  0.45  0.37
## UNEM  0.11  1.00  0.35 -0.12 -0.18 -0.15  0.64  0.13  0.26  0.38
## LOWI  0.11  0.35  1.00 -0.48 -0.11  0.07  0.49 -0.35 -0.07  0.51
## HCOS -0.05 -0.12 -0.48  1.00 -0.06 -0.31 -0.04  0.37  0.05 -0.51
## MENT  0.10 -0.18 -0.11 -0.06  1.00  0.42 -0.18  0.36  0.56 -0.26
## INFM -0.03 -0.15  0.07 -0.31  0.42  1.00 -0.43  0.25  0.15  0.01
## SUIC  0.45  0.64  0.49 -0.04 -0.18 -0.43  1.00 -0.16  0.19  0.62
## POLL -0.03  0.13 -0.35  0.37  0.36  0.25 -0.16  1.00  0.33 -0.54
## ROBB  0.45  0.26 -0.07  0.05  0.56  0.15  0.19  0.33  1.00  0.05
## TRAF  0.37  0.38  0.51 -0.51 -0.26  0.01  0.62 -0.54  0.05  1.00
## 
## n= 18 
## 
## 
## P
##      INCO   UNEM   LOWI   HCOS   MENT   INFM   SUIC   POLL   ROBB   TRAF  
## INCO        0.6628 0.6640 0.8399 0.6807 0.9104 0.0587 0.9062 0.0616 0.1358
## UNEM 0.6628        0.1507 0.6383 0.4644 0.5417 0.0044 0.6071 0.2968 0.1246
## LOWI 0.6640 0.1507        0.0416 0.6681 0.7874 0.0377 0.1498 0.7830 0.0294
## HCOS 0.8399 0.6383 0.0416        0.8271 0.2102 0.8745 0.1347 0.8398 0.0289
## MENT 0.6807 0.4644 0.6681 0.8271        0.0809 0.4707 0.1471 0.0161 0.3001
## INFM 0.9104 0.5417 0.7874 0.2102 0.0809        0.0758 0.3141 0.5514 0.9702
## SUIC 0.0587 0.0044 0.0377 0.8745 0.4707 0.0758        0.5295 0.4505 0.0057
## POLL 0.9062 0.6071 0.1498 0.1347 0.1471 0.3141 0.5295        0.1824 0.0207
## ROBB 0.0616 0.2968 0.7830 0.8398 0.0161 0.5514 0.4505 0.1824        0.8448
## TRAF 0.1358 0.1246 0.0294 0.0289 0.3001 0.9702 0.0057 0.0207 0.8448

Faire un copier-coller dans un tableur pour mettre en évidence les corrélations positives ou négatives les plus élevées.

4. Module ACP

Si ce n’est déjà fait, installer le package “FactoMineR”, puis le charger

## install.packages("FactoMineR")

library("FactoMineR")

L’ACP avec tous les éléments actifs (18 lignes et 10 variables)

res.villeUS.pca <- PCA(villeUS,
                       quanti.sup=NULL,
                       quali.sup=11,
                       ncp = 10,
                       scale.unit = TRUE,
                       graph = TRUE)

5. Les sorties de la fonction ACP

Si ce n’est déjà fait, installer le package “factoextra”, puis le charger

## install.packages("factoextra")

library("factoextra")
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

Si on veut connaître tous les résultats de la fonction ACP dans Factominer et/ou factoextra :

### Listage des résultats

print(res.villeUS.pca)
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 18 individuals, described by 11 variables
## *The results are available in the following objects:
## 
##    name                description                                          
## 1  "$eig"              "eigenvalues"                                        
## 2  "$var"              "results for the variables"                          
## 3  "$var$coord"        "coord. for the variables"                           
## 4  "$var$cor"          "correlations variables - dimensions"                
## 5  "$var$cos2"         "cos2 for the variables"                             
## 6  "$var$contrib"      "contributions of the variables"                     
## 7  "$ind"              "results for the individuals"                        
## 8  "$ind$coord"        "coord. for the individuals"                         
## 9  "$ind$cos2"         "cos2 for the individuals"                           
## 10 "$ind$contrib"      "contributions of the individuals"                   
## 11 "$quali.sup"        "results for the supplementary categorical variables"
## 12 "$quali.sup$coord"  "coord. for the supplementary categories"            
## 13 "$quali.sup$v.test" "v-test of the supplementary categories"             
## 14 "$call"             "summary statistics"                                 
## 15 "$call$centre"      "mean of the variables"                              
## 16 "$call$ecart.type"  "standard error of the variables"                    
## 17 "$call$row.w"       "weights for the individuals"                        
## 18 "$call$col.w"       "weights for the variables"

5.1. Les eigenvalues ou valeurs propres

La somme des eigenvalues égale le nombre d’axes (ici 10).

Une eigenvalue > 1 a plus d’information (d’inertie) qu’une variable d’origine.

eig.val <- get_eigenvalue(res.villeUS.pca)
eig.val
##        eigenvalue variance.percent cumulative.variance.percent
## Dim.1  3.13678257       31.3678257                    31.36783
## Dim.2  2.11616209       21.1616209                    52.52945
## Dim.3  1.74039136       17.4039136                    69.93336
## Dim.4  1.03429616       10.3429616                    80.27632
## Dim.5  0.60017574        6.0017574                    86.27808
## Dim.6  0.50683809        5.0683809                    91.34646
## Dim.7  0.36207137        3.6207137                    94.96717
## Dim.8  0.28642641        2.8642641                    97.83144
## Dim.9  0.17056334        1.7056334                    99.53707
## Dim.10 0.04629286        0.4629286                   100.00000

5.2. Coordonnées des variables et des individus sur les axes

Interprétation des coordonnées des variables sur les axes factoriels.

res.villeUS.pca$var$coord[, 1:4] ### [toutes les lignes ; colonnes 1 à 4, cf résultats *eigenvalue*]
##            Dim.1        Dim.2       Dim.3       Dim.4
## INCO  0.39560813  0.536119331  0.14246394 -0.58198206
## UNEM  0.57900897  0.305601007  0.33319539  0.59068649
## LOWI  0.71963800  0.002230143 -0.31548511  0.26812275
## HCOS -0.47696803 -0.002073653  0.73700074 -0.08166666
## MENT -0.36386245  0.698405257 -0.32928802 -0.08140328
## INFM -0.24736823  0.356743673 -0.74338594  0.19070573
## SUIC  0.80846071  0.237858291  0.42914943  0.03821180
## POLL -0.53921742  0.531897829  0.26102817  0.43285281
## ROBB  0.01533755  0.882886692  0.09726058 -0.12992465
## TRAF  0.87261774  0.036334147 -0.20870423 -0.13937712

S’il y a des variables qualitatives supplémentaires.

res.villeUS.pca$quali.sup$coord  ## coord. for the supplementary categories            
##           Dim.1      Dim.2      Dim.3      Dim.4      Dim.5       Dim.6
## Est  -1.5548530  0.8945544 -0.1014338 -0.1955121 -0.3150085 -0.03399306
## Nord -0.3371181 -0.5995410  0.1429636  0.2397920  0.3664659 -0.07885122
## Sud   1.7136070 -0.7971957 -2.3154127 -0.7598437 -0.6979346  0.17020223
## West  3.6905570  1.2587443  1.9256609  0.1695600 -0.1636404  0.26961090
##            Dim.7       Dim.8       Dim.9      Dim.10
## Est  -0.06255407  0.07850852 -0.03957849  0.06378194
## Nord -0.02305670 -0.12154458 -0.00387713 -0.05923945
## Sud   0.12076133  0.07365287  0.06435682  0.08521550
## West  0.13937900  0.27702644  0.05203650  0.02190718

A évaluer au regard des valeurs test.

res.villeUS.pca$quali.sup$v.test ## v-test of the supplementary categories, signif. si > 1.96 (on arrondi à 2 !)
##           Dim.1      Dim.2      Dim.3      Dim.4     Dim.5      Dim.6
## Est  -2.2448375  1.5724255 -0.1966062 -0.4915740 -1.039731 -0.1220938
## Nord -0.7848097 -1.6992955  0.4468138  0.9721575  1.950378 -0.4566655
## Sud   1.4104201 -0.7988586 -2.5584941 -1.0891339 -1.313273  0.3485059
## West  3.0375902  1.2613701  2.1278246  0.2430415 -0.307915  0.5520549
##           Dim.7      Dim.8       Dim.9     Dim.10
## Est  -0.2658253  0.3751008 -0.24504971  0.7580162
## Nord -0.1579881 -0.9363830 -0.03870722 -1.1352159
## Sud   0.2925568  0.2006146  0.22715966  0.5773525
## West  0.3376601  0.7545607  0.18367272  0.1484257

Interprétation des coordonnées des individus sur les axes factoriels (avec variante peu utile (: à modifier !)

res.villeUS.pca$ind$coord[2:5 , 1:3] ### [ligne 2 à 5 ; colonnes 1 à 3]
##                    Dim.1     Dim.2        Dim.3
## Los Angeles   3.31609420 1.4298302  1.608101583
## Chicago      -1.78315609 1.8827346 -0.169136546
## Philadelphie -1.04955578 0.4903795 -1.592588729
## Détroit      -0.06755761 1.4477498  0.005699929

5.3. Contribution des variables et des individus sur les axes

Interprétation des contributions des variables sur les axes factoriels.

res.villeUS.pca$var$contrib[, 1:10] ### [toutes les lignes ; colonnes toutes, cf résultats *eigenvalue*] pour rechercher les plus contributives dans un tableur.
##             Dim.1        Dim.2      Dim.3      Dim.4      Dim.5     Dim.6
## INCO  4.989373263 1.358232e+01  1.1661730 32.7472083 14.8277462  9.025905
## UNEM 10.687747241 4.413271e+00  6.3789773 33.7341023  1.2590651  9.302999
## LOWI 16.509874064 2.350264e-04  5.7188778  6.9506022 10.7477561 39.395070
## HCOS  7.252606769 2.031999e-04 31.2096525  0.6448291  0.0625541 11.905383
## MENT  4.220754145 2.304974e+01  6.2302423  0.6406767 31.4845770  0.151603
## INFM  1.950758172 6.014003e+00 31.7527807  3.5162727 22.6633702  3.415322
## SUIC 20.836914962 2.673546e+00 10.5820588  0.1411725  1.6392898  3.651746
## POLL  9.269224736 1.336926e+01  3.9149646 18.1148847  8.6225794  5.037814
## ROBB  0.007499415 3.683503e+01  0.5435341  1.6320679  2.9764794 12.996029
## TRAF 24.275247233 6.238512e-02  2.5027391  1.8781836  5.7165829  5.118127
##             Dim.7        Dim.8     Dim.9    Dim.10
## INCO 11.657739893  1.339157755  9.133329  1.531045
## UNEM  0.007135910  1.524700125 29.928385  2.763617
## LOWI  0.007074053 14.305419925  2.071422  4.293669
## HCOS 43.295876330  0.436285118  1.227796  3.964814
## MENT  0.197501662 19.505954027  9.446662  5.072287
## INFM 16.361090298  0.006789183  2.682978 11.636635
## SUIC  0.527652960 18.697321765  1.028229 40.222068
## POLL 11.019486285  5.913679920 20.325902  4.412199
## ROBB  3.180674186 25.975227510 13.637700  2.215759
## TRAF 13.745768424 12.295464671 10.517596 23.887906

Interprétation des contributions des individus sur les axes factoriels.

res.villeUS.pca$ind$contrib[, 1:10] ### [ligne toutes ; colonnes toutes] pour rechercher les plus contributifs dans un tableur.
##                         Dim.1        Dim.2        Dim.3        Dim.4
## New York          9.145858375 17.250245057 9.669722e-01  0.074856376
## Los Angeles      19.475866797  5.367196480 8.254819e+00  2.226268221
## Chicago           5.631470348  9.305865202 9.131781e-02  1.773252774
## Philadelphie      1.950985868  0.631310777 8.096323e+00  5.203601324
## Détroit           0.008083355  5.502570262 1.037097e-04  0.563321768
## Boston           11.200605071  5.397075187 1.083058e+01  1.577594804
## San Francisco    29.266339799  3.105729612 1.606292e+01  0.498603585
## Washington D. C.  5.598863783  4.589844533 6.449089e-10 11.148967570
## Pittsburgh        0.684903841  0.631960083 1.335609e-01 16.150040116
## St Louis          0.396579056  0.002911421 4.191259e+00  1.803662396
## Cleveland         5.207954626  2.740241836 2.922312e+00 19.888302137
## Baltimore         0.045066540  6.150429746 5.452402e+00  0.103606973
## Houston           4.829252030  0.174660931 1.157602e+01  0.009676728
## Minneapolis       0.253374828  9.728919285 1.768254e+00 17.260184420
## Dallas            5.586001679  4.689092314 2.372980e+01 11.721584126
## Cincinnati        0.091908605  4.987693860 7.501015e-02  0.693963129
## Milwaukee         0.594839361 10.317266870 5.818115e+00  3.403273214
## Buffalo           0.032046038  9.426986545 3.023171e-02  5.899240340
##                        Dim.5      Dim.6        Dim.7       Dim.8       Dim.9
## New York          1.52614281 12.9809461  0.025751897 25.33569664  1.10769928
## Los Angeles       0.37180869  1.2099050  4.638855106 13.58222828  1.77914669
## Chicago          10.56087692  6.1417053  1.749337689  0.39433405  6.64496036
## Philadelphie      1.80901505  5.6457504  0.154590830 15.18675183  5.06165899
## Détroit          11.31768195 19.4790893  1.228792672  0.13458537  0.56079380
## Boston            0.09071031  0.1803033  0.003546079  6.49287369  1.33766266
## San Francisco     2.57762789  0.4696128 10.534751070  1.55077943  3.71644735
## Washington D. C. 17.19338876 12.1054214  0.150968591  7.70964188  2.83651203
## Pittsburgh        2.97540241  3.0242890 23.609234071  1.99736697  0.78613495
## St Louis          8.22248117  1.7929478  9.933840147  0.19810512 26.65597407
## Cleveland        10.56702284  1.8234855 10.468878974  3.33078574  6.57954292
## Baltimore         0.20607309 12.1594459  0.296323720  6.04913830  8.14741347
## Houston          15.56117345  0.2730508  0.683847163  0.53674377  0.30482186
## Minneapolis       2.99139503  0.8910320  1.570185932  3.56662387 11.19615400
## Dallas            0.09126895  2.7209946  3.143611502  1.90820312  0.03330043
## Cincinnati        8.35800494 14.6836922 16.262951662  2.14981743  0.33339926
## Milwaukee         4.10579024  0.2418655  1.135347500  0.03128085  2.42512732
## Buffalo           1.47413552  4.1764633 14.409185393  9.84504366 20.49325054
##                       Dim.10
## New York          2.04342974
## Los Angeles       2.53006515
## Chicago           0.07558056
## Philadelphie      7.89920856
## Détroit           7.60976756
## Boston           20.80231556
## San Francisco     1.23351607
## Washington D. C.  3.08225922
## Pittsburgh       14.09034655
## St Louis          2.56483786
## Cleveland        10.07465628
## Baltimore         2.07911766
## Houston           0.06758021
## Minneapolis       2.72471258
## Dallas            2.58272357
## Cincinnati        6.47960125
## Milwaukee         9.10684596
## Buffalo           4.95343565

Pour sauvegarder dans un fichier csv

contrib_var <- res.villeUS.pca$var$contrib[, 1:10] ### [toutes les lignes ; colonnes toutes, cf résultats *eigenvalue*] pour rechercher les plus contributives dans un tableur.

contrib_var
##             Dim.1        Dim.2      Dim.3      Dim.4      Dim.5     Dim.6
## INCO  4.989373263 1.358232e+01  1.1661730 32.7472083 14.8277462  9.025905
## UNEM 10.687747241 4.413271e+00  6.3789773 33.7341023  1.2590651  9.302999
## LOWI 16.509874064 2.350264e-04  5.7188778  6.9506022 10.7477561 39.395070
## HCOS  7.252606769 2.031999e-04 31.2096525  0.6448291  0.0625541 11.905383
## MENT  4.220754145 2.304974e+01  6.2302423  0.6406767 31.4845770  0.151603
## INFM  1.950758172 6.014003e+00 31.7527807  3.5162727 22.6633702  3.415322
## SUIC 20.836914962 2.673546e+00 10.5820588  0.1411725  1.6392898  3.651746
## POLL  9.269224736 1.336926e+01  3.9149646 18.1148847  8.6225794  5.037814
## ROBB  0.007499415 3.683503e+01  0.5435341  1.6320679  2.9764794 12.996029
## TRAF 24.275247233 6.238512e-02  2.5027391  1.8781836  5.7165829  5.118127
##             Dim.7        Dim.8     Dim.9    Dim.10
## INCO 11.657739893  1.339157755  9.133329  1.531045
## UNEM  0.007135910  1.524700125 29.928385  2.763617
## LOWI  0.007074053 14.305419925  2.071422  4.293669
## HCOS 43.295876330  0.436285118  1.227796  3.964814
## MENT  0.197501662 19.505954027  9.446662  5.072287
## INFM 16.361090298  0.006789183  2.682978 11.636635
## SUIC  0.527652960 18.697321765  1.028229 40.222068
## POLL 11.019486285  5.913679920 20.325902  4.412199
## ROBB  3.180674186 25.975227510 13.637700  2.215759
## TRAF 13.745768424 12.295464671 10.517596 23.887906
write.csv(contrib_var, "contrib_var.csv") ## le collage dans Excel marche aussi avec assistant d'importation !

5.4. Qualité de la représentation (COS2) des variables et des individus sur les axes

Interprétation des corrélations des variables sur les axes factoriels.

res.villeUS.pca$var$cor[, 1:10] ### [toutes les lignes ; colonnes toutes, cf résultats *eigenvalue* si adaptation nécessaire] pour rechercher les plus corrélées aux axes.
##            Dim.1        Dim.2       Dim.3       Dim.4       Dim.5      Dim.6
## INCO  0.39560813  0.536119331  0.14246394 -0.58198206  0.29831617  0.2138848
## UNEM  0.57900897  0.305601007  0.33319539  0.59068649  0.08692873 -0.2171431
## LOWI  0.71963800  0.002230143 -0.31548511  0.26812275 -0.25397918  0.4468436
## HCOS -0.47696803 -0.002073653  0.73700074 -0.08166666  0.01937613  0.2456441
## MENT -0.36386245  0.698405257 -0.32928802 -0.08140328 -0.43469851  0.0277197
## INFM -0.24736823  0.356743673 -0.74338594  0.19070573  0.36880896  0.1315681
## SUIC  0.80846071  0.237858291  0.42914943  0.03821180 -0.09918982  0.1360457
## POLL -0.53921742  0.531897829  0.26102817  0.43285281  0.22748765  0.1597922
## ROBB  0.01533755  0.882886692  0.09726058 -0.12992465 -0.13365668 -0.2566492
## TRAF  0.87261774  0.036334147 -0.20870423 -0.13937712  0.18522835 -0.1610609
##             Dim.7        Dim.8       Dim.9      Dim.10
## INCO -0.205449116 -0.061933041  0.12481230  0.02662264
## UNEM -0.005083020 -0.066084370  0.22593550  0.03576811
## LOWI  0.005060941 -0.202421593 -0.05943977  0.04458321
## HCOS  0.395931780 -0.035350188  0.04576210  0.04284187
## MENT  0.026741297  0.236368790  0.12693519  0.04845727
## INFM  0.243390272 -0.004409763  0.06764745 -0.07339572
## SUIC  0.043709042  0.231417520 -0.04187819 -0.13645493
## POLL -0.199745852  0.130147383 -0.18619489  0.04519440
## ROBB  0.107314075 -0.272763474 -0.15251530 -0.03202715
## TRAF  0.223090772  0.187663152 -0.13393716  0.10515891

Interprétation des COS2 des variables sur les axes factoriels.

res.villeUS.pca$var$cos2[, 1:10] ### [toutes les lignes ; colonnes toutes, cf résultats *eigenvalue* si adaptation nécessaire] pour rechercher la meilleure représentation des variables sur les axes ou plans factoriels.
##             Dim.1        Dim.2       Dim.3       Dim.4        Dim.5
## INCO 0.1565057911 2.874239e-01 0.020295974 0.338703120 0.0889925351
## UNEM 0.3352513930 9.339198e-02 0.111019169 0.348910526 0.0075566033
## LOWI 0.5178788527 4.973540e-06 0.099530854 0.071889811 0.0645054249
## HCOS 0.2274985053 4.300039e-06 0.543170094 0.006669443 0.0003754345
## MENT 0.1323958805 4.877699e-01 0.108430598 0.006626494 0.1889627924
## INFM 0.0611910424 1.272660e-01 0.552622651 0.036368674 0.1360200493
## SUIC 0.6536087176 5.657657e-02 0.184169236 0.001460141 0.0098386195
## POLL 0.2907554263 2.829153e-01 0.068135706 0.187361557 0.0517506296
## ROBB 0.0002352403 7.794889e-01 0.009459621 0.016880416 0.0178641070
## TRAF 0.7614617251 1.320170e-03 0.043557454 0.019425981 0.0343095434
##             Dim.6        Dim.7        Dim.8       Dim.9       Dim.10
## INCO 0.0457467268 4.220934e-02 3.835702e-03 0.015578111 0.0007087648
## UNEM 0.0471511432 2.583709e-05 4.367144e-03 0.051046852 0.0012793574
## LOWI 0.1996692204 2.561312e-05 4.097450e-02 0.003533086 0.0019876623
## HCOS 0.0603410164 1.567620e-01 1.249636e-03 0.002094170 0.0018354257
## MENT 0.0007683819 7.150970e-04 5.587020e-02 0.016112542 0.0023481071
## INFM 0.0173101550 5.923882e-02 1.944601e-05 0.004576178 0.0053869315
## SUIC 0.0185084405 1.910480e-03 5.355407e-02 0.001753782 0.0186199467
## POLL 0.0255335624 3.989841e-02 1.693834e-02 0.034668538 0.0020425334
## ROBB 0.0658688248 1.151631e-02 7.439991e-02 0.023260917 0.0010257385
## TRAF 0.0259406159 4.976949e-02 3.521746e-02 0.017939163 0.0110583957

Interprétation des qualités de représentation des individus sur les axes factoriels.

res.villeUS.pca$ind$cos2[, 1:10] ### il n'y a pas de "cor", [ligne toutes ; colonnes toutes] pour rechercher la meilleure représentation des individus sur les axes ou plans factoriels.
##                         Dim.1        Dim.2        Dim.3        Dim.4
## New York         0.3498688127 0.4451850182 2.052378e-02 0.0009442132
## Los Angeles      0.6367317826 0.1183782019 1.497372e-01 0.0239992526
## Chicago          0.3485201758 0.3885325356 3.135625e-03 0.0361857241
## Philadelphie     0.1676171720 0.0365908165 3.859355e-01 0.1474106625
## Détroit          0.0008494597 0.3901047708 6.046905e-06 0.0195194773
## Boston           0.5002394539 0.1626145635 2.683803e-01 0.0232323115
## San Francisco    0.6872426548 0.0492005380 2.092804e-01 0.0038606247
## Washington D. C. 0.3020163798 0.1670294625 1.930152e-11 0.1983013839
## Pittsburgh       0.0638502717 0.0397454399 6.908363e-03 0.4964405530
## St Louis         0.0506249143 0.0002507287 2.968531e-01 0.0759189642
## Cleveland        0.2661021882 0.0944571111 8.284578e-02 0.3350730201
## Baltimore        0.0043677332 0.4021349842 2.931921e-01 0.0033109404
## Houston          0.3321340865 0.0081038960 4.417285e-01 0.0002194434
## Minneapolis      0.0164943112 0.4272674725 6.386721e-02 0.3704901343
## Dallas           0.2083193940 0.1179728134 4.910042e-01 0.1441369897
## Cincinnati       0.0092966620 0.3403573776 4.209722e-03 0.0231455636
## Milwaukee        0.0453029947 0.5300982737 2.458510e-01 0.0854642579
## Buffalo          0.0024537177 0.4869536732 1.284326e-03 0.1489384513
##                         Dim.5       Dim.6        Dim.7        Dim.8
## New York         0.0111704316 0.080236492 1.137103e-04 0.0884997939
## Los Angeles      0.0023258050 0.006391398 1.750573e-02 0.0405470636
## Chicago          0.1250547301 0.061415773 1.249653e-02 0.0022284303
## Philadelphie     0.0297372350 0.078373796 1.533056e-03 0.1191400921
## Détroit          0.2275635274 0.330753636 1.490526e-02 0.0012914501
## Boston           0.0007751524 0.001301143 1.828078e-05 0.0264790294
## San Francisco    0.0115812633 0.001781831 2.855458e-02 0.0033252200
## Washington D. C. 0.1774541247 0.105510416 9.399981e-04 0.0379746212
## Pittsburgh       0.0530729073 0.045555546 2.540533e-01 0.0170027693
## St Louis         0.2008313155 0.036981711 1.463732e-01 0.0023091857
## Cleveland        0.1033065646 0.015054571 6.174349e-02 0.0155401999
## Baltimore        0.0038213532 0.190414709 3.314958e-03 0.0535333038
## Houston          0.2047717023 0.003034323 5.428782e-03 0.0033707709
## Minneapolis      0.0372596273 0.009372358 1.179863e-02 0.0212010164
## Dallas           0.0006512469 0.016396122 1.353217e-02 0.0064980327
## Cincinnati       0.1617586487 0.239988795 1.898803e-01 0.0198564267
## Milwaukee        0.0598298606 0.002976363 9.980810e-03 0.0002175376
## Buffalo          0.0215964156 0.051670629 1.273501e-01 0.0688329754
##                         Dim.9       Dim.10
## New York         2.304114e-03 1.153639e-03
## Los Angeles      3.162808e-03 1.220734e-03
## Chicago          2.236144e-02 6.903127e-05
## Philadelphie     2.364605e-02 1.001561e-02
## Détroit          3.204466e-03 1.180191e-02
## Boston           3.248511e-03 1.371127e-02
## San Francisco    4.745380e-03 4.274803e-04
## Washington D. C. 8.319869e-03 2.453745e-03
## Pittsburgh       3.985029e-03 1.938583e-02
## St Louis         1.850250e-01 4.831968e-03
## Cleveland        1.828009e-02 7.596982e-03
## Baltimore        4.293613e-02 2.973792e-03
## Houston          1.139937e-03 6.859347e-05
## Minneapolis      3.963153e-02 2.617707e-03
## Dallas           6.752736e-05 1.421465e-03
## Cincinnati       1.833736e-03 9.672723e-03
## Milwaukee        1.004299e-02 1.023587e-02
## Buffalo          8.532226e-02 5.597401e-03

Pour sauvegarder dans un fichier csv

cos2_var <- res.villeUS.pca$var$cos2[, 1:10] ### [toutes les lignes ; colonnes toutes, cf résultats *eigenvalue*] pour rechercher la meilleure représentation des variables sur les axes ou plans factoriels.

cos2_var
##             Dim.1        Dim.2       Dim.3       Dim.4        Dim.5
## INCO 0.1565057911 2.874239e-01 0.020295974 0.338703120 0.0889925351
## UNEM 0.3352513930 9.339198e-02 0.111019169 0.348910526 0.0075566033
## LOWI 0.5178788527 4.973540e-06 0.099530854 0.071889811 0.0645054249
## HCOS 0.2274985053 4.300039e-06 0.543170094 0.006669443 0.0003754345
## MENT 0.1323958805 4.877699e-01 0.108430598 0.006626494 0.1889627924
## INFM 0.0611910424 1.272660e-01 0.552622651 0.036368674 0.1360200493
## SUIC 0.6536087176 5.657657e-02 0.184169236 0.001460141 0.0098386195
## POLL 0.2907554263 2.829153e-01 0.068135706 0.187361557 0.0517506296
## ROBB 0.0002352403 7.794889e-01 0.009459621 0.016880416 0.0178641070
## TRAF 0.7614617251 1.320170e-03 0.043557454 0.019425981 0.0343095434
##             Dim.6        Dim.7        Dim.8       Dim.9       Dim.10
## INCO 0.0457467268 4.220934e-02 3.835702e-03 0.015578111 0.0007087648
## UNEM 0.0471511432 2.583709e-05 4.367144e-03 0.051046852 0.0012793574
## LOWI 0.1996692204 2.561312e-05 4.097450e-02 0.003533086 0.0019876623
## HCOS 0.0603410164 1.567620e-01 1.249636e-03 0.002094170 0.0018354257
## MENT 0.0007683819 7.150970e-04 5.587020e-02 0.016112542 0.0023481071
## INFM 0.0173101550 5.923882e-02 1.944601e-05 0.004576178 0.0053869315
## SUIC 0.0185084405 1.910480e-03 5.355407e-02 0.001753782 0.0186199467
## POLL 0.0255335624 3.989841e-02 1.693834e-02 0.034668538 0.0020425334
## ROBB 0.0658688248 1.151631e-02 7.439991e-02 0.023260917 0.0010257385
## TRAF 0.0259406159 4.976949e-02 3.521746e-02 0.017939163 0.0110583957
write.csv(cos2_var, "cos2_var.csv") ## le collage dans Excel marche aussi avec assistant d'importation !

6. Aide à l’interprétation

6.1. Graphiques un peu plus sophistiqués

  • simple Plan factoriel avec les variables et les points
          ### Simple Biplot (comme dans SPAD de base)
          ### valable si peu de variables et d'individus !)
          fviz_pca_biplot(res.villeUS.pca,
                          repel = TRUE,# pour éviter l'écrasement
                          col.var = "#2E9FDF", # Variables color
                          col.ind = "#696969")  # Individuals color

  • habillage plus sophistiqué
### pour changer l'habillage des "ggplots", utilisation de la fonction  ggpar() [ggpubr package] 
### ici reprise du précédent garphique
          res.biplot <- fviz_pca_biplot(res.villeUS.pca, 
                          col.ind = villeUS$lieu,
                          palette = "jco", # palette du journal "JCO" (journal of clinical oncology)  
                          addEllipses = TRUE,
                          ellipse.type = "confidence", # affiche les ellipses de confiance
                          ellipse.level=0.95, # Agit sur la taille des ellipses
                          label = "var",# pas d'étiquette pour les individus que pour les "var"
                          col.var = "black",
                          axes = c(1,2), # /!\ faire varier les axes ! Les régions sont-elles toujours discriminates ? /!\
                          repel = TRUE,# pour éviter l'écrasement
                          legend.title = "Régions")
  • mise en forme
ggpubr::ggpar(res.biplot,
                        title = "Analyse en composante principale",
                        subtitle = "Villes US",
                        caption = "Source: JONES and FLEX, The quality of life [...] 1970. in SANDERS 1989, p. 57",
                        xlab = "PC1",
                        ylab = "PC2",
                        legend.title = "Régions",
                        legend.position = "top",
                        ggtheme = theme_gray(),# voir dans "http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/#r-packages" les thèmes possibles
                        palette = "jco") # palette du journal "JCO" (journal of clinical oncology)
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.

6.2. Aide à la description et à l’interprétation

Identification des variables les plus significatives par composante.

### fonction dimdesc() [in FactoMineR]

res.desc <- dimdesc(res.villeUS.pca,
                    axes = c(1:4),
                    proba = 0.05)

Aide sur l’axe 1

### Description of dimension 1
res.desc$Dim.1
## $quanti
##      correlation      p.value
## TRAF   0.8726177 2.319808e-06
## SUIC   0.8084607 4.897035e-05
## LOWI   0.7196380 7.601770e-04
## UNEM   0.5790090 1.180736e-02
## HCOS  -0.4769680 4.534601e-02
## POLL  -0.5392174 2.092819e-02
## 
## $quali
##             R2      p.value
## lieu 0.8186732 1.857411e-05
## 
## $category
##            Estimate      p.value
## lieu=West  2.812509 0.0004879634
## lieu=Est  -2.432901 0.0194839279
## 
## attr(,"class")
## [1] "condes" "list"

Aide sur l’axe 2

### Description of dimension 2
res.desc$Dim.2
## $quanti
##      correlation      p.value
## ROBB   0.8828867 1.224712e-06
## MENT   0.6984053 1.264876e-03
## INCO   0.5361193 2.182158e-02
## POLL   0.5318978 2.308675e-02
## 
## attr(,"class")
## [1] "condes" "list"

etc.

6.4 Le couteau suisse

Le package FactoInvestigate décrit et interprète automatiquement les résultats de votre analyse factorielle (ACP, AFC ou ACM) en choisissant les graphes les plus appropriés pour le rapport… (mais semble inaccessible en ce moment ! http://factominer.free.fr/reporting/index_fr.html)

## install.packages(FactoInvestigate)

## library(Investigate)

## Investigate(res.villeUS.pca)