Skip to content

Financial Data Connectors

OpenML Crawler provides comprehensive financial data connectors that access market data, economic indicators, and financial information from various sources. These connectors handle real-time and historical financial data with proper rate limiting and data validation.

Supported Data Sources

Alpha Vantage

Free and premium financial market data with comprehensive stock, forex, and cryptocurrency data.

Features:

  • Real-time and historical stock prices
  • Forex exchange rates
  • Cryptocurrency data
  • Technical indicators
  • Fundamental data
  • Sector performance
  • Economic indicators

Configuration:

connectors:
  finance:
    alpha_vantage:
      api_key: "${ALPHA_VANTAGE_API_KEY}"
      data_type: "json"
      output_size: "compact"  # compact or full
      rate_limit_buffer: 0.9

Usage:

from openmlcrawler.connectors.finance import AlphaVantageConnector

connector = AlphaVantageConnector(api_key="your_key")

# Get stock data
stock_data = connector.get_stock_data(
    symbol="AAPL",
    function="TIME_SERIES_DAILY",
    output_size="full"
)

# Get forex rates
forex_data = connector.get_forex_data(
    from_symbol="USD",
    to_symbol="EUR",
    interval="5min"
)

# Get technical indicators
rsi_data = connector.get_technical_indicator(
    symbol="AAPL",
    indicator="RSI",
    interval="daily",
    time_period=14
)

Yahoo Finance

Comprehensive financial data from Yahoo Finance with stocks, ETFs, mutual funds, and more.

Features:

  • Stock quotes and historical data
  • ETF and mutual fund data
  • Options data
  • Financial statements
  • Analyst estimates
  • News and press releases
  • Economic calendar

Usage:

from openmlcrawler.connectors.finance import YahooFinanceConnector

connector = YahooFinanceConnector()

# Get stock info
stock = connector.get_stock_info("AAPL")

# Get historical data
historical = connector.get_historical_data(
    symbol="AAPL",
    start_date="2020-01-01",
    end_date="2023-12-31",
    interval="1d"
)

# Get financial statements
income_stmt = connector.get_income_statement("AAPL", yearly=True)
balance_sheet = connector.get_balance_sheet("AAPL")
cash_flow = connector.get_cash_flow("AAPL")

Federal Reserve Economic Data (FRED)

Economic data from the Federal Reserve with comprehensive US economic indicators.

Features:

  • Interest rates and monetary policy
  • Employment and labor data
  • Inflation and price indices
  • GDP and economic growth
  • Trade and balance of payments
  • Housing and real estate data
  • Financial market indicators

Usage:

from openmlcrawler.connectors.finance import FREDConnector

connector = FREDConnector(api_key="your_key")

# Get economic series
gdp_data = connector.get_series(
    series_id="GDP",
    start_date="2000-01-01",
    end_date="2023-12-31"
)

# Get unemployment rate
unemployment = connector.get_series(
    series_id="UNRATE",
    observation_start="2020-01-01"
)

# Search for series
results = connector.search_series(
    search_text="inflation",
    limit=10
)

CoinMarketCap

Cryptocurrency market data with comprehensive crypto information and analytics.

Features:

  • Cryptocurrency prices and market data
  • Exchange listings and trading pairs
  • Historical price data
  • Market capitalization rankings
  • Volume and liquidity data
  • News and analysis
  • Portfolio tracking

Configuration:

connectors:
  finance:
    coinmarketcap:
      api_key: "${COINMARKETCAP_API_KEY}"
      pro_api: true
      convert: "USD"

Usage:

from openmlcrawler.connectors.finance import CoinMarketCapConnector

connector = CoinMarketCapConnector(api_key="your_key")

# Get cryptocurrency listings
listings = connector.get_listings(limit=100)

# Get specific cryptocurrency
bitcoin = connector.get_cryptocurrency("bitcoin")

# Get historical data
btc_history = connector.get_historical_data(
    symbol="BTC",
    start_date="2020-01-01",
    end_date="2023-12-31"
)

# Get market quotes
quotes = connector.get_quotes(["BTC", "ETH", "ADA"])

Data Types and Parameters

Market Data

Data Type Description Sources Frequency
Stock Prices Real-time and historical prices Alpha Vantage, Yahoo Real-time to daily
Forex Rates Currency exchange rates Alpha Vantage, Yahoo Real-time to daily
Crypto Prices Cryptocurrency prices CoinMarketCap, Alpha Vantage Real-time
Economic Indicators GDP, inflation, employment FRED, Yahoo Daily to quarterly
Technical Indicators RSI, MACD, moving averages Alpha Vantage Daily
Options Data Options chains and pricing Yahoo Real-time

Financial Statements

  • Income Statement: Revenue, expenses, net income
  • Balance Sheet: Assets, liabilities, equity
  • Cash Flow Statement: Operating, investing, financing cash flows
  • Financial Ratios: P/E, P/B, ROE, ROA, margins

Market Indicators

  • Market Indices: S&P 500, Dow Jones, NASDAQ
  • Sector Performance: Technology, healthcare, finance sectors
  • Volatility Indices: VIX, crypto volatility
  • Bond Yields: Treasury yields, corporate bonds
  • Commodity Prices: Gold, oil, agricultural commodities

Data Collection Strategies

Real-time Data Streaming

from openmlcrawler.connectors.finance import RealTimeFinanceStreamer

streamer = RealTimeFinanceStreamer()

# Stream stock prices
streamer.stream_stocks(
    symbols=["AAPL", "GOOGL", "MSFT"],
    callback=process_stock_data
)

# Stream forex rates
streamer.stream_forex(
    pairs=["USD/EUR", "USD/JPY", "EUR/GBP"],
    callback=process_forex_data
)

# Stream crypto prices
streamer.stream_crypto(
    symbols=["BTC", "ETH", "ADA"],
    callback=process_crypto_data
)

Historical Data Collection

from openmlcrawler.connectors.finance import HistoricalFinanceCollector

collector = HistoricalFinanceCollector()

# Collect stock history
stock_history = collector.collect_stock_history(
    symbols=["AAPL", "GOOGL", "MSFT"],
    start_date="2020-01-01",
    end_date="2023-12-31",
    interval="1d"
)

# Collect economic data
economic_data = collector.collect_economic_data(
    indicators=["GDP", "UNRATE", "CPI"],
    start_date="2000-01-01",
    end_date="2023-12-31"
)

# Collect crypto history
crypto_history = collector.collect_crypto_history(
    symbols=["BTC", "ETH"],
    start_date="2018-01-01",
    end_date="2023-12-31"
)

Batch Processing

from openmlcrawler.connectors.finance import BatchFinanceProcessor

processor = BatchFinanceProcessor()

# Process multiple data sources
results = processor.process_batch(
    sources=["alpha_vantage", "yahoo", "fred"],
    symbols=["AAPL", "GOOGL", "MSFT"],
    indicators=["GDP", "UNRATE"],
    date_range=("2020-01-01", "2023-12-31")
)

Data Quality and Validation

Quality Checks

  1. Data Completeness: Check for missing values and gaps
  2. Price Validation: Verify price ranges and consistency
  3. Volume Validation: Check trading volume reasonableness
  4. Date Consistency: Ensure chronological data ordering
  5. Cross-Source Validation: Compare data across multiple sources

Validation Framework

from openmlcrawler.connectors.finance import FinanceDataValidator

validator = FinanceDataValidator()

# Validate stock data
validation_result = validator.validate_stock_data(
    data=stock_data,
    checks=[
        "price_range_check",
        "volume_consistency",
        "date_continuity",
        "outlier_detection"
    ]
)

# Validate economic data
eco_validation = validator.validate_economic_data(
    data=economic_data,
    checks=[
        "seasonal_adjustment",
        "revision_consistency",
        "source_reliability"
    ]
)

Technical Analysis Integration

Built-in Technical Indicators

from openmlcrawler.connectors.finance import TechnicalAnalyzer

analyzer = TechnicalAnalyzer()

# Calculate technical indicators
indicators = analyzer.calculate_indicators(
    price_data=stock_data,
    indicators=[
        "SMA_20", "SMA_50", "EMA_12", "EMA_26",
        "RSI_14", "MACD", "BBANDS", "STOCH"
    ]
)

# Generate trading signals
signals = analyzer.generate_signals(
    indicators=indicators,
    strategies=["moving_average_crossover", "rsi_divergence"]
)

Custom Indicators

from openmlcrawler.connectors.finance import CustomIndicatorCalculator

calculator = CustomIndicatorCalculator()

# Calculate custom indicators
custom_indicators = calculator.calculate_custom(
    price_data=data,
    custom_functions=[
        lambda x: x['close'].rolling(10).std(),  # Custom volatility
        lambda x: (x['high'] - x['low']) / x['close']  # Custom range ratio
    ]
)

Risk Management

Portfolio Risk Analysis

from openmlcrawler.connectors.finance import PortfolioRiskAnalyzer

risk_analyzer = PortfolioRiskAnalyzer()

# Calculate portfolio risk metrics
risk_metrics = risk_analyzer.calculate_risk(
    portfolio={"AAPL": 0.3, "GOOGL": 0.3, "MSFT": 0.4},
    returns_data=historical_returns,
    metrics=[
        "volatility",
        "sharpe_ratio",
        "max_drawdown",
        "value_at_risk",
        "expected_shortfall"
    ]
)

# Stress testing
stress_results = risk_analyzer.stress_test(
    portfolio=portfolio,
    scenarios=[
        "market_crash_2008",
        "covid_2020",
        "tech_bubble_2000"
    ]
)

Configuration Options

Global Configuration

finance_connectors:
  default_sources: ["alpha_vantage", "yahoo"]
  data_quality:
    enable_validation: true
    strict_mode: false
    outlier_threshold: 3.0
  caching:
    enable_cache: true
    cache_ttl_minutes: 15
    max_cache_size_gb: 50
  rate_limiting:
    requests_per_minute: 5
    burst_limit: 10
  risk_management:
    enable_risk_analysis: true
    confidence_level: 0.95

Source-Specific Settings

alpha_vantage:
  api_key: "${ALPHA_VANTAGE_API_KEY}"
  premium_account: false
  output_size: "compact"
  datatype: "json"

yahoo_finance:
  user_agent: "OpenMLCrawler/1.0"
  timeout_seconds: 30
  retry_attempts: 3

fred:
  api_key: "${FRED_API_KEY}"
  realtime_start: "1776-07-04"
  realtime_end: "9999-12-31"

coinmarketcap:
  api_key: "${COINMARKETCAP_API_KEY}"
  pro_api: true
  convert: "USD"
  auxiliary: "cmc_rank"

Best Practices

Performance Optimization

  1. Use Caching: Financial data changes frequently but can be cached
  2. Batch Requests: Combine multiple symbol requests
  3. Selective Data: Only request needed data fields
  4. Rate Limiting: Respect API rate limits to avoid throttling
  5. Connection Pooling: Reuse connections for multiple requests

Cost Management

  1. API Tier Selection: Choose appropriate API tiers for your needs
  2. Usage Monitoring: Track API usage and costs
  3. Data Sampling: Sample high-frequency data for analysis
  4. Caching Strategy: Implement intelligent caching policies
  5. Fallback Sources: Use multiple data sources for redundancy

Data Reliability

  1. Multiple Sources: Cross-validate data from multiple sources
  2. Data Quality Checks: Implement comprehensive validation
  3. Error Recovery: Handle API failures gracefully
  4. Monitoring: Monitor data collection health
  5. Version Control: Track data source versions and changes

Troubleshooting

Common Issues

API Rate Limiting

Error: API rate limit exceeded
Solution: Implement exponential backoff and reduce request frequency

Invalid Symbols

Error: Invalid symbol or ticker
Solution: Verify symbol format and check if symbol exists

Data Unavailable

Error: Historical data not available
Solution: Check date range availability and data source limitations

Authentication Failed

Error: API key invalid or expired
Solution: Verify API credentials and check account status

See Also