Regression Analysis

Perform linear and multiple regression analysis directly in AVA SQL.

Overview

AVA Database includes built-in support for Ordinary Least Squares (OLS) regression analysis, allowing you to fit statistical models to your data using standard SQL syntax with the REGRESS command.

Key Features

  • ✓ Simple and multiple linear regression
  • ✓ Automatic coefficient calculation
  • ✓ R-squared and adjusted R-squared
  • ✓ Standard errors and p-values
  • ✓ Model persistence and predictions
  • ✓ Integration with SQL queries

Basic Usage

Simple Linear Regression

Predict a dependent variable using a single predictor:

-- Predict house prices based on square footage
REGRESS price FROM houses USING sqft

-- Output:
-- Intercept: 50000.00
-- Coefficient (sqft): 150.25
-- R-squared: 0.85
-- Adjusted R-squared: 0.84
-- Standard Error: 5000.00

Multiple Linear Regression

Use multiple predictors for more accurate models:

-- Predict price with multiple features
REGRESS price FROM real_estate
USING sqft, bedrooms, bathrooms, age, location_score

-- Results show coefficient for each predictor
-- with corresponding p-values and significance levels

Creating Predictive Models

Save regression models for later use with the CREATE REGRESSION MODEL command:

-- Create and name a model
CREATE REGRESSION MODEL price_model
AS SELECT sqft, bedrooms, age
FROM housing_data
PREDICT price;

-- Use the model for predictions
SELECT
    address,
    sqft,
    bedrooms,
    age,
    PREDICT(price_model) as predicted_price
FROM new_listings;

-- Calculate prediction errors
SELECT
    address,
    actual_price,
    PREDICT(price_model) as predicted_price,
    actual_price - PREDICT(price_model) as residual,
    ABS(actual_price - PREDICT(price_model)) as absolute_error
FROM validation_set;

Practical Examples

Example 1: Sales Forecasting

-- Predict sales based on advertising spend
REGRESS sales FROM marketing_data
USING tv_advertising, radio_advertising, social_media_spend

-- Interpretation:
-- Each $1000 in TV advertising → +$5000 in sales
-- Each $1000 in radio → +$3000 in sales
-- Each $1000 in social media → +$2500 in sales

Example 2: Customer Lifetime Value

-- Predict customer value based on behavior
CREATE REGRESSION MODEL clv_model
AS SELECT
    purchase_frequency,
    average_order_value,
    account_age_months,
    engagement_score
FROM customer_history
PREDICT lifetime_value;

-- Score new customers
SELECT
    customer_id,
    PREDICT(clv_model) as predicted_clv,
    CASE
        WHEN PREDICT(clv_model) > 10000 THEN 'High Value'
        WHEN PREDICT(clv_model) > 5000 THEN 'Medium Value'
        ELSE 'Low Value'
    END as segment
FROM new_customers;

Understanding Output

Model Statistics

Metric Description
Coefficients Impact of each predictor on the outcome
R-squared Proportion of variance explained (0-1)
Adjusted R² R-squared adjusted for number of predictors
P-values Statistical significance of each predictor
Std. Error Average prediction error

Best Practices

Data Preparation

Remove missing values, handle outliers, and normalize features before regression.

Feature Selection

Choose predictors that have logical relationships with the dependent variable.

Model Validation

Always test your model on a holdout dataset to assess real-world performance.