Loads and processes data from a comprehensive panel database containing economic, financial, and development indicators. The function handles data filtering, frequency adjustments, aggregation, and seasonal adjustments. Memory usage is optimized by loading data on demand (approximately 15-50MB during execution).
Usage
wp_data(
ISO,
formula,
variable = NULL,
years,
adjust_seasonal = FALSE,
window_seasadj = NULL,
matching_yq = "Q2Y",
interpolation_method = "Linear",
aggregate_iso = NULL,
aggregate_period = NULL,
quartile = FALSE,
na.rm = TRUE,
reference = TRUE,
country_names = FALSE,
clean = TRUE,
verbose = TRUE,
debug = FALSE
)
Arguments
- ISO
character vector; ISO 3-letter country codes or category names: - Individual countries (e.g., "USA", "CHN") - Categories (e.g., "CTR_LDR", "BRICS", "AFRICA") - Category exclusions using hyphen (e.g., "CTR_LDR - USA") See wp_get_category() for available categories.
- formula
character vector; mathematical expressions using Symbol codes: - Simple variables (e.g., "GDP_C") - Calculations with basic operators (e.g., "100*CU_C/GDP_C") - Multiple formulas as vector - Basic operators: + - * / ( ) Division by zero not handled, use with caution.
- variable
character vector or NULL; names for formula outputs (Column Variable): - Must match length of formula if provided - Used for output labeling and plotting Default is NULL, using formula as names.
- years
numeric vector of length 2; year range; c(start_year, end_year): - First element: start year - Second element: end year Data availability varies by country/indicator.
- adjust_seasonal
logical (TRUE/FALSE); apply seasonal adjustment: - TRUE: adjust quarterly data using STL decomposition - Only affects quarterly data (no effect on annual) Default is FALSE.
- window_seasadj
numeric or NULL (= 7); window for seasonal adjustment: - Controls smoothing in STL decomposition - Larger values = more smoothing - Only used if adjust_seasonal = TRUE Default is 7.
- matching_yq
character; method for handling mixed frequencies: - "Q2Y": convert quarterly to yearly - "Y2Q": convert yearly to quarterly Default is "Q2Y".
- interpolation_method
character; method for Y2Q conversion: - "None": repeat yearly value - "Linear": linear interpolation - "Linear-Scale": scaled linear interpolation Only used if matching_yq = "Y2Q".
- aggregate_iso
character or NULL; method for country aggregation: - "Sum": sum values across countries - "Mean": average values across countries - "Median": median values across countries - NULL: no aggregation
- aggregate_period
character or NULL; method for time aggregation: - "Sum": sum over period - "Mean"/"Median": central tendency - "SD": standard deviation - "Growth": period-over-period growth - "CAGR": compound annual growth rate - "GeoMean": geometric mean - NULL: no aggregation
- quartile
logical; include quartile calculations: - TRUE: add first/third quartiles to aggregations - Only used if aggregate_iso or aggregate_period is specified Default is FALSE.
- na.rm
logical; handle missing values in aggregations: - TRUE: exclude NA values - FALSE: return NA if any value is NA Default is TRUE.
- reference
logical; include data source citations: - TRUE: add Reference column to output Default is TRUE.
- country_names
logical; include full country names: - TRUE: add Country column with names from ISO codes Default is FALSE.
- clean
logical; remove rows with NA values: - TRUE: remove NA rows from final output Default is TRUE.
- verbose
logical; print processing information: - TRUE: show progress and warnings Default is TRUE.
- debug
logical; print detailed debugging information: - TRUE: show technical details Default is FALSE.
Value
A data.frame containing:
ISO: 3-letter country codes
Date: time period (YYYY or YYYYQN format)
Variable: indicator names from formula/variable
Value: calculated values
Reference: data sources (if reference=TRUE)
Country: country names (if country_names=TRUE)
Details
The function processes data in several steps:
Validates inputs and resolves country categories
Loads required data (quarterly/yearly) on demand
Extracts symbols from formulas and filters data
Handles frequency mismatches (Q2Y or Y2Q conversion)
Evaluates formulas for each country
Performs any requested aggregations
Applies seasonal adjustments if specified
Cleans and formats output
Memory usage is optimized by:
Loading data only when needed
Filtering to required columns early
Processing one formula at a time
Clearing intermediate objects
Data Validation and Error Handling:
Missing data warnings by country/variable
Automatic date range adjustment if requested years unavailable
Minimum observations check for seasonal adjustment (16 required)
Warnings for inappropriate aggregation requests (e.g., growth rates with negative values)
Variable Types and Units:
Stock variables: measured at a point in time (e.g., reserves, debt)
Flow variables: measured over a period (e.g., GDP, trade)
Index variables: base year representations (various base years available)
Percentage variables: bounded ratios
For IMF Balance of Payments data, quarterly values are multiplied by 4 to represent annualized flows, ensuring consistency with yearly data
Common Data Patterns: World Bank (WB) indicators:
_ZS suffix: ratios expressed as percentages
_CD suffix: current US dollars
_KD suffix: constant dollars
_XD suffix: indices
_PC suffix: per capita values
_FE/_MA suffixes: female/male specific indicators
IMF Balance of Payments (BOP):
_C suffix: current prices
_R suffix: real/constant prices (with base year)
o/i prefixes: outward/inward flows
DI/POR/OI suffixes: direct investment/portfolio/other investment
E/D suffixes: equity/debt components
Natural Disasters (EMDAT):
DIS prefix: disaster-related indicators
_AFF/_DEATH/_DMG suffixes: affected people/fatalities/economic damage
BIO/CLIM/GEO/HYDRO prefixes: biological/climatic/geological/hydrological disasters
Financial Market Data (BIS):
CRED prefix: credit-related indicators
_ALL/_BANK prefixes: all sectors/banking sector
_CD/_KN/_ZS suffixes: USD/local currency/percentage of GDP
Other specialized databases (JST, KOF, etc.) have their own consistent naming patterns that are documented in their respective sources.
Note
Variables in formulas refer to Symbol codes in the underlying database. Users should understand the economic/financial meaning of variables and their units before performing calculations.
See also
wp_plot_series
for plotting time series
wp_plot_scatter
for scatter plots
wp_plot_bar
for bar plots
wp_get_category
for available country categories
Examples
# Basic usage - GDP ratio for one country
data <- wp_data(
ISO = "USA",
formula = "100*CU_C/GDP_C",
variable = "Current Account (% GDP)",
years = c(2000, 2023)
)
#> [Step 1] Input Validation.
#> [Step 2] Data Filtering (ISO codes, Symbols, and Years).
#> [LOADING] Loading quarterly data (first use)
#> [Steps 3 to 7] Loop through each formula.
#> --- Step 3-4: Formula: 100*CU_C/GDP_C -- Symbols: CU_C (Q) GDP_C (Q)
#> --- Step 5: Adjust frequencies.
#> --- Step 6-7: Data Processing (Loop for each country).
#> ---
#> [Step 10] Clean database [remove NAs in output] - clean is TRUE.
# Multiple countries and indicators with aggregation
data <- wp_data(
ISO = c("CHN", "JPN", "KOR"),
formula = c("EXg_C/GDP_C", "IMg_C/GDP_C"),
variable = c("Exports", "Imports"),
years = c(2010, 2023),
adjust_seasonal = TRUE,
aggregate_iso = "Mean"
)
#> [Step 1] Input Validation.
#> [Step 2] Data Filtering (ISO codes, Symbols, and Years).
#> [Steps 3 to 7] Loop through each formula.
#> --- Step 3-4: Formula: EXg_C/GDP_C -- Symbols: EXg_C (Q) GDP_C (Q)
#> --- Step 5: Adjust frequencies.
#> --- Step 6-7: Data Processing (Loop for each country).
#> ---
#> --- Step 3-4: Formula: IMg_C/GDP_C -- Symbols: IMg_C (Q) GDP_C (Q)
#> --- Step 5: Adjust frequencies.
#> --- Step 6-7: Data Processing (Loop for each country).
#> ---
#> [Step 8] Aggregate values (group of ISO codes) - Method: Mean | Quartile: FALSE | na.rm: TRUE
#> [Step 9] Adjust for seasonal variations [only for quarterly data] - adjust_seasonal is TRUE
#> [INFO] Seasonal adjustments for: Exports Imports
#> [Step 10] Clean database [remove NAs in output] - clean is TRUE.
# Using categories with exclusions
data <- wp_data(
ISO = "CTR_LDR - USA",
formula = "FA_C/GDP_C",
years = c(1990, 2023),
adjust_seasonal = TRUE,
aggregate_period = "Growth"
)
#> [Step 0] Get ISO codes for ISO codes.
#> --- ISO3(5): DEU FRA GBR ITA JPN
#> [Step 1] Input Validation.
#> [Step 2] Data Filtering (ISO codes, Symbols, and Years).
#> [Steps 3 to 7] Loop through each formula.
#> --- Step 3-4: Formula: FA_C/GDP_C -- Symbols: FA_C (Q) GDP_C (Q)
#> --- No data for FA_C: JPN[1990-95]
#> --- Step 5: Adjust frequencies.
#> --- Step 6-7: Data Processing (Loop for each country).
#> ---
#> [Step 8] Aggregate values (time periods) - Method: Growth | Quartile: FALSE | na.rm: TRUE
#> Warning: NAs introduced by coercion
#> /!\ Negative values detected for FA_C/GDP_C in ISO: DEU . Growth calculation not suitable for negative values. Returning NA.
#> /!\ Negative values detected for FA_C/GDP_C in ISO: FRA . Growth calculation not suitable for negative values. Returning NA.
#> /!\ Negative values detected for FA_C/GDP_C in ISO: GBR . Growth calculation not suitable for negative values. Returning NA.
#> /!\ Negative values detected for FA_C/GDP_C in ISO: ITA . Growth calculation not suitable for negative values. Returning NA.
#> /!\ Negative values detected for FA_C/GDP_C in ISO: JPN . Growth calculation not suitable for negative values. Returning NA.
#> [Step 10] Clean database [remove NAs in output] - clean is TRUE.
# Mixed frequency handling
data <- wp_data(
ISO = "DEU",
formula = c("GDP_R_2015_Y", "CU_C"),
years = c(2015, 2023),
matching_yq = "Y2Q",
interpolation_method = "Linear"
)
#> [Step 0] Get ISO codes for ISO codes.
#> --- ISO3(1): DEU
#> [Step 1] Input Validation.
#> [Step 2] Data Filtering (ISO codes, Symbols, and Years).
#> [LOADING] Loading yearly data (first use)
#> [Steps 3 to 7] Loop through each formula.
#> --- Step 3-4: Formula: GDP_R_2015_Y -- Symbols: GDP_R_2015_Y (Y)
#> --- Step 5: Adjust frequencies.
#> --- Step 6-7: Data Processing (Loop for each country).
#> ---
#> --- Step 3-4: Formula: CU_C -- Symbols: CU_C (Q)
#> --- Step 5: Adjust frequencies.
#> --- Step 6-7: Data Processing (Loop for each country).
#> ---
#> [Step 10] Clean database [remove NAs in output] - clean is TRUE.