Regression Analysis
Perform linear and multiple regression analysis directly in AVA SQL.
Overview
AVA Database includes built-in support for Ordinary Least Squares (OLS) regression analysis,
allowing you to fit statistical models to your data using standard SQL syntax with the
REGRESS command.
Key Features
- ✓ Simple and multiple linear regression
- ✓ Automatic coefficient calculation
- ✓ R-squared and adjusted R-squared
- ✓ Standard errors and p-values
- ✓ Model persistence and predictions
- ✓ Integration with SQL queries
Basic Usage
Simple Linear Regression
Predict a dependent variable using a single predictor:
-- Predict house prices based on square footage
REGRESS price FROM houses USING sqft
-- Output:
-- Intercept: 50000.00
-- Coefficient (sqft): 150.25
-- R-squared: 0.85
-- Adjusted R-squared: 0.84
-- Standard Error: 5000.00
Multiple Linear Regression
Use multiple predictors for more accurate models:
-- Predict price with multiple features
REGRESS price FROM real_estate
USING sqft, bedrooms, bathrooms, age, location_score
-- Results show coefficient for each predictor
-- with corresponding p-values and significance levels
Creating Predictive Models
Save regression models for later use with the CREATE REGRESSION MODEL command:
-- Create and name a model
CREATE REGRESSION MODEL price_model
AS SELECT sqft, bedrooms, age
FROM housing_data
PREDICT price;
-- Use the model for predictions
SELECT
address,
sqft,
bedrooms,
age,
PREDICT(price_model) as predicted_price
FROM new_listings;
-- Calculate prediction errors
SELECT
address,
actual_price,
PREDICT(price_model) as predicted_price,
actual_price - PREDICT(price_model) as residual,
ABS(actual_price - PREDICT(price_model)) as absolute_error
FROM validation_set;
Practical Examples
Example 1: Sales Forecasting
-- Predict sales based on advertising spend
REGRESS sales FROM marketing_data
USING tv_advertising, radio_advertising, social_media_spend
-- Interpretation:
-- Each $1000 in TV advertising → +$5000 in sales
-- Each $1000 in radio → +$3000 in sales
-- Each $1000 in social media → +$2500 in sales
Example 2: Customer Lifetime Value
-- Predict customer value based on behavior
CREATE REGRESSION MODEL clv_model
AS SELECT
purchase_frequency,
average_order_value,
account_age_months,
engagement_score
FROM customer_history
PREDICT lifetime_value;
-- Score new customers
SELECT
customer_id,
PREDICT(clv_model) as predicted_clv,
CASE
WHEN PREDICT(clv_model) > 10000 THEN 'High Value'
WHEN PREDICT(clv_model) > 5000 THEN 'Medium Value'
ELSE 'Low Value'
END as segment
FROM new_customers;
Understanding Output
Model Statistics
| Metric | Description |
|---|---|
| Coefficients | Impact of each predictor on the outcome |
| R-squared | Proportion of variance explained (0-1) |
| Adjusted R² | R-squared adjusted for number of predictors |
| P-values | Statistical significance of each predictor |
| Std. Error | Average prediction error |
Best Practices
Data Preparation
Remove missing values, handle outliers, and normalize features before regression.
Feature Selection
Choose predictors that have logical relationships with the dependent variable.
Model Validation
Always test your model on a holdout dataset to assess real-world performance.