R
Definition
R is a free, open-source programming language and environment for statistical computing and graphics. Created by Ross Ihaka and Robert Gentleman at the University of Auckland in 1993, it is the dominant language in statistics, bioinformatics, and data science.
Key Details
- Paradigm: Multi-paradigm (functional, object-oriented, imperative)
- License: GPL-2/GPL-3 (free software)
- Runtime: R interpreter (written primarily in C and R)
- Package ecosystem: CRAN (Comprehensive R Archive Network) — over 20,000 packages
- Graphics: Built-in graphics engine; ggplot2 for advanced visualization
Language Features
- Vectorized operations: Operations on entire arrays without explicit loops
- S3 and S4 object systems: Multiple OOP paradigms
- Functional programming: First-class functions, lapply, sapply, map
- Reproducible research: Integration with R Markdown, Sweave
- Statistical models: Built-in functions for regression, classification, time series
Key Packages
| Package | Purpose |
|---|---|
| ggplot2 | Data visualization (grammar of graphics) |
| dplyr | Data manipulation (tidyverse) |
| tidyr | Data tidying and reshaping |
| caret | Machine learning |
| shiny | Interactive web applications |
| bioconductor | Bioinformatics |
| data.table | High-performance data frames |
Comparison with Alternatives
- **vs Python libraries
- vs sql: R handles complex statistical analysis; SQL excels at structured data retrieval
- vs matlab: R is free and open-source; MATLAB has proprietary toolboxes and simulation tools
Use Cases
- Statistical analysis and hypothesis testing
- Data visualization and reporting
- Bioinformatics and genomics
- Academic research and publications
- Financial modeling and risk analysis