Loads and processes data from a comprehensive panel database containing economic, financial, and development indicators. The function handles data filtering, frequency adjustments, aggregation, and seasonal adjustments. Memory usage is optimized by loading data on demand (approximately 15-50MB during execution).

Usage

wp_data(
  ISO,
  formula,
  variable = NULL,
  years,
  adjust_seasonal = FALSE,
  window_seasadj = NULL,
  matching_yq = "Q2Y",
  interpolation_method = "Linear",
  aggregate_iso = NULL,
  aggregate_period = NULL,
  quartile = FALSE,
  na.rm = TRUE,
  reference = TRUE,
  country_names = FALSE,
  clean = TRUE,
  verbose = TRUE,
  debug = FALSE
)

Arguments

ISO: character vector; ISO 3-letter country codes or category names: - Individual countries (e.g., "USA", "CHN") - Categories (e.g., "CTR_LDR", "BRICS", "AFRICA") - Category exclusions using hyphen (e.g., "CTR_LDR - USA") See wp_get_category() for available categories.
formula: character vector; mathematical expressions using Symbol codes: - Simple variables (e.g., "GDP_C") - Calculations with basic operators (e.g., "100*CU_C/GDP_C") - Multiple formulas as vector - Basic operators: + - * / ( ) Division by zero not handled, use with caution.
variable: character vector or NULL; names for formula outputs (Column Variable): - Must match length of formula if provided - Used for output labeling and plotting Default is NULL, using formula as names.
years: numeric vector of length 2; year range; c(start_year, end_year): - First element: start year - Second element: end year Data availability varies by country/indicator.
adjust_seasonal: logical (TRUE/FALSE); apply seasonal adjustment: - TRUE: adjust quarterly data using STL decomposition - Only affects quarterly data (no effect on annual) Default is FALSE.
window_seasadj: numeric or NULL (= 7); window for seasonal adjustment: - Controls smoothing in STL decomposition - Larger values = more smoothing - Only used if adjust_seasonal = TRUE Default is 7.
matching_yq: character; method for handling mixed frequencies: - "Q2Y": convert quarterly to yearly - "Y2Q": convert yearly to quarterly Default is "Q2Y".
interpolation_method: character; method for Y2Q conversion: - "None": repeat yearly value - "Linear": linear interpolation - "Linear-Scale": scaled linear interpolation Only used if matching_yq = "Y2Q".
aggregate_iso: character or NULL; method for country aggregation: - "Sum": sum values across countries - "Mean": average values across countries - "Median": median values across countries - NULL: no aggregation
aggregate_period: character or NULL; method for time aggregation: - "Sum": sum over period - "Mean"/"Median": central tendency - "SD": standard deviation - "Growth": period-over-period growth - "CAGR": compound annual growth rate - "GeoMean": geometric mean - NULL: no aggregation
quartile: logical; include quartile calculations: - TRUE: add first/third quartiles to aggregations - Only used if aggregate_iso or aggregate_period is specified Default is FALSE.
na.rm: logical; handle missing values in aggregations: - TRUE: exclude NA values - FALSE: return NA if any value is NA Default is TRUE.
reference: logical; include data source citations: - TRUE: add Reference column to output Default is TRUE.
country_names: logical; include full country names: - TRUE: add Country column with names from ISO codes Default is FALSE.
clean: logical; remove rows with NA values: - TRUE: remove NA rows from final output Default is TRUE.
verbose: logical; print processing information: - TRUE: show progress and warnings Default is TRUE.
debug: logical; print detailed debugging information: - TRUE: show technical details Default is FALSE.

Value

A data.frame containing:

ISO: 3-letter country codes
Date: time period (YYYY or YYYYQN format)
Variable: indicator names from formula/variable
Value: calculated values
Reference: data sources (if reference=TRUE)
Country: country names (if country_names=TRUE)

Details

The function processes data in several steps:

Validates inputs and resolves country categories
Loads required data (quarterly/yearly) on demand
Extracts symbols from formulas and filters data
Handles frequency mismatches (Q2Y or Y2Q conversion)
Evaluates formulas for each country
Performs any requested aggregations
Applies seasonal adjustments if specified
Cleans and formats output

Memory usage is optimized by:

Loading data only when needed
Filtering to required columns early
Processing one formula at a time
Clearing intermediate objects

Data Validation and Error Handling:

Missing data warnings by country/variable
Automatic date range adjustment if requested years unavailable
Minimum observations check for seasonal adjustment (16 required)
Warnings for inappropriate aggregation requests (e.g., growth rates with negative values)

Variable Types and Units:

Stock variables: measured at a point in time (e.g., reserves, debt)
Flow variables: measured over a period (e.g., GDP, trade)
Index variables: base year representations (various base years available)
Percentage variables: bounded ratios
For IMF Balance of Payments data, quarterly values are multiplied by 4 to represent annualized flows, ensuring consistency with yearly data

Common Data Patterns: World Bank (WB) indicators:

_ZS suffix: ratios expressed as percentages
_CD suffix: current US dollars
_KD suffix: constant dollars
_XD suffix: indices
_PC suffix: per capita values
_FE/_MA suffixes: female/male specific indicators

IMF Balance of Payments (BOP):

_C suffix: current prices
_R suffix: real/constant prices (with base year)
o/i prefixes: outward/inward flows
DI/POR/OI suffixes: direct investment/portfolio/other investment
E/D suffixes: equity/debt components

Natural Disasters (EMDAT):

DIS prefix: disaster-related indicators
_AFF/_DEATH/_DMG suffixes: affected people/fatalities/economic damage
BIO/CLIM/GEO/HYDRO prefixes: biological/climatic/geological/hydrological disasters

Financial Market Data (BIS):

CRED prefix: credit-related indicators
_ALL/_BANK prefixes: all sectors/banking sector
_CD/_KN/_ZS suffixes: USD/local currency/percentage of GDP

Other specialized databases (JST, KOF, etc.) have their own consistent naming patterns that are documented in their respective sources.

Note

Variables in formulas refer to Symbol codes in the underlying database. Users should understand the economic/financial meaning of variables and their units before performing calculations.

Examples

# Basic usage - GDP ratio for one country
data <- wp_data(
  ISO = "USA",
  formula = "100*CU_C/GDP_C",
  variable = "Current Account (% GDP)",
  years = c(2000, 2023)
)
#>  [Step 1] Input Validation. 
#>  [Step 2] Data Filtering (ISO codes, Symbols, and Years). 
#>  [LOADING] Loading quarterly data (first use) 
#>  [Steps 3 to 7] Loop through each formula. 
#>   ---  Step 3-4: Formula: 100*CU_C/GDP_C  --  Symbols: CU_C (Q)   GDP_C (Q)   
#>   ---  Step 5: Adjust frequencies. 
#>   ---  Step 6-7: Data Processing (Loop for each country). 
#>   ---   
#>  [Step 10] Clean database [remove NAs in output] - clean is TRUE. 

# Multiple countries and indicators with aggregation
data <- wp_data(
  ISO = c("CHN", "JPN", "KOR"),
  formula = c("EXg_C/GDP_C", "IMg_C/GDP_C"),
  variable = c("Exports", "Imports"),
  years = c(2010, 2023),
  adjust_seasonal = TRUE,
  aggregate_iso = "Mean"
)
#>  [Step 1] Input Validation. 
#>  [Step 2] Data Filtering (ISO codes, Symbols, and Years). 
#>  [Steps 3 to 7] Loop through each formula. 
#>   ---  Step 3-4: Formula: EXg_C/GDP_C  --  Symbols: EXg_C (Q)   GDP_C (Q)   
#>   ---  Step 5: Adjust frequencies. 
#>   ---  Step 6-7: Data Processing (Loop for each country). 
#>   ---   
#>   ---  Step 3-4: Formula: IMg_C/GDP_C  --  Symbols: IMg_C (Q)   GDP_C (Q)   
#>   ---  Step 5: Adjust frequencies. 
#>   ---  Step 6-7: Data Processing (Loop for each country). 
#>   ---   
#>  [Step 8] Aggregate values (group of ISO codes) - Method: Mean | Quartile: FALSE | na.rm: TRUE 
#>  [Step 9] Adjust for seasonal variations [only for quarterly data] - adjust_seasonal is TRUE 
#>  [INFO] Seasonal adjustments for: Exports   Imports 
#>  [Step 10] Clean database [remove NAs in output] - clean is TRUE. 

# Using categories with exclusions
data <- wp_data(
  ISO = "CTR_LDR - USA",
  formula = "FA_C/GDP_C",
  years = c(1990, 2023),
  adjust_seasonal = TRUE,
  aggregate_period = "Growth"
)
#>  [Step 0] Get ISO codes for ISO codes. 
#>   ---  ISO3(5): DEU FRA GBR ITA JPN 
#>  [Step 1] Input Validation. 
#>  [Step 2] Data Filtering (ISO codes, Symbols, and Years). 
#>  [Steps 3 to 7] Loop through each formula. 
#>   ---  Step 3-4: Formula: FA_C/GDP_C  --  Symbols: FA_C (Q)   GDP_C (Q)   
#>   ---  No data for FA_C: JPN[1990-95] 
#>   ---  Step 5: Adjust frequencies. 
#>   ---  Step 6-7: Data Processing (Loop for each country). 
#>   ---   
#>  [Step 8] Aggregate values (time periods) - Method: Growth | Quartile: FALSE | na.rm: TRUE 
#> Warning: NAs introduced by coercion
#> /!\  Negative values detected for FA_C/GDP_C in ISO: DEU . Growth calculation not suitable for negative values. Returning NA. 
#> /!\  Negative values detected for FA_C/GDP_C in ISO: FRA . Growth calculation not suitable for negative values. Returning NA. 
#> /!\  Negative values detected for FA_C/GDP_C in ISO: GBR . Growth calculation not suitable for negative values. Returning NA. 
#> /!\  Negative values detected for FA_C/GDP_C in ISO: ITA . Growth calculation not suitable for negative values. Returning NA. 
#> /!\  Negative values detected for FA_C/GDP_C in ISO: JPN . Growth calculation not suitable for negative values. Returning NA. 
#>  [Step 10] Clean database [remove NAs in output] - clean is TRUE. 

# Mixed frequency handling
data <- wp_data(
  ISO = "DEU",
  formula = c("GDP_R_2015_Y", "CU_C"),
  years = c(2015, 2023),
  matching_yq = "Y2Q",
  interpolation_method = "Linear"
)
#>  [Step 0] Get ISO codes for ISO codes. 
#>   ---  ISO3(1): DEU 
#>  [Step 1] Input Validation. 
#>  [Step 2] Data Filtering (ISO codes, Symbols, and Years). 
#>  [LOADING] Loading yearly data (first use) 
#>  [Steps 3 to 7] Loop through each formula. 
#>   ---  Step 3-4: Formula: GDP_R_2015_Y  --  Symbols: GDP_R_2015_Y (Y)   
#>   ---  Step 5: Adjust frequencies. 
#>   ---  Step 6-7: Data Processing (Loop for each country). 
#>   ---   
#>   ---  Step 3-4: Formula: CU_C  --  Symbols: CU_C (Q)   
#>   ---  Step 5: Adjust frequencies. 
#>   ---  Step 6-7: Data Processing (Loop for each country). 
#>   ---   
#>  [Step 10] Clean database [remove NAs in output] - clean is TRUE.

Load and Process World Panel Data