Financial Data Connectors¶
OpenML Crawler provides comprehensive financial data connectors that access market data, economic indicators, and financial information from various sources. These connectors handle real-time and historical financial data with proper rate limiting and data validation.
Supported Data Sources¶
Alpha Vantage¶
Free and premium financial market data with comprehensive stock, forex, and cryptocurrency data.
Features:
- Real-time and historical stock prices
- Forex exchange rates
- Cryptocurrency data
- Technical indicators
- Fundamental data
- Sector performance
- Economic indicators
Configuration:
connectors:
finance:
alpha_vantage:
api_key: "${ALPHA_VANTAGE_API_KEY}"
data_type: "json"
output_size: "compact" # compact or full
rate_limit_buffer: 0.9
Usage:
from openmlcrawler.connectors.finance import AlphaVantageConnector
connector = AlphaVantageConnector(api_key="your_key")
# Get stock data
stock_data = connector.get_stock_data(
symbol="AAPL",
function="TIME_SERIES_DAILY",
output_size="full"
)
# Get forex rates
forex_data = connector.get_forex_data(
from_symbol="USD",
to_symbol="EUR",
interval="5min"
)
# Get technical indicators
rsi_data = connector.get_technical_indicator(
symbol="AAPL",
indicator="RSI",
interval="daily",
time_period=14
)
Yahoo Finance¶
Comprehensive financial data from Yahoo Finance with stocks, ETFs, mutual funds, and more.
Features:
- Stock quotes and historical data
- ETF and mutual fund data
- Options data
- Financial statements
- Analyst estimates
- News and press releases
- Economic calendar
Usage:
from openmlcrawler.connectors.finance import YahooFinanceConnector
connector = YahooFinanceConnector()
# Get stock info
stock = connector.get_stock_info("AAPL")
# Get historical data
historical = connector.get_historical_data(
symbol="AAPL",
start_date="2020-01-01",
end_date="2023-12-31",
interval="1d"
)
# Get financial statements
income_stmt = connector.get_income_statement("AAPL", yearly=True)
balance_sheet = connector.get_balance_sheet("AAPL")
cash_flow = connector.get_cash_flow("AAPL")
Federal Reserve Economic Data (FRED)¶
Economic data from the Federal Reserve with comprehensive US economic indicators.
Features:
- Interest rates and monetary policy
- Employment and labor data
- Inflation and price indices
- GDP and economic growth
- Trade and balance of payments
- Housing and real estate data
- Financial market indicators
Usage:
from openmlcrawler.connectors.finance import FREDConnector
connector = FREDConnector(api_key="your_key")
# Get economic series
gdp_data = connector.get_series(
series_id="GDP",
start_date="2000-01-01",
end_date="2023-12-31"
)
# Get unemployment rate
unemployment = connector.get_series(
series_id="UNRATE",
observation_start="2020-01-01"
)
# Search for series
results = connector.search_series(
search_text="inflation",
limit=10
)
CoinMarketCap¶
Cryptocurrency market data with comprehensive crypto information and analytics.
Features:
- Cryptocurrency prices and market data
- Exchange listings and trading pairs
- Historical price data
- Market capitalization rankings
- Volume and liquidity data
- News and analysis
- Portfolio tracking
Configuration:
connectors:
finance:
coinmarketcap:
api_key: "${COINMARKETCAP_API_KEY}"
pro_api: true
convert: "USD"
Usage:
from openmlcrawler.connectors.finance import CoinMarketCapConnector
connector = CoinMarketCapConnector(api_key="your_key")
# Get cryptocurrency listings
listings = connector.get_listings(limit=100)
# Get specific cryptocurrency
bitcoin = connector.get_cryptocurrency("bitcoin")
# Get historical data
btc_history = connector.get_historical_data(
symbol="BTC",
start_date="2020-01-01",
end_date="2023-12-31"
)
# Get market quotes
quotes = connector.get_quotes(["BTC", "ETH", "ADA"])
Data Types and Parameters¶
Market Data¶
Data Type | Description | Sources | Frequency |
---|---|---|---|
Stock Prices | Real-time and historical prices | Alpha Vantage, Yahoo | Real-time to daily |
Forex Rates | Currency exchange rates | Alpha Vantage, Yahoo | Real-time to daily |
Crypto Prices | Cryptocurrency prices | CoinMarketCap, Alpha Vantage | Real-time |
Economic Indicators | GDP, inflation, employment | FRED, Yahoo | Daily to quarterly |
Technical Indicators | RSI, MACD, moving averages | Alpha Vantage | Daily |
Options Data | Options chains and pricing | Yahoo | Real-time |
Financial Statements¶
- Income Statement: Revenue, expenses, net income
- Balance Sheet: Assets, liabilities, equity
- Cash Flow Statement: Operating, investing, financing cash flows
- Financial Ratios: P/E, P/B, ROE, ROA, margins
Market Indicators¶
- Market Indices: S&P 500, Dow Jones, NASDAQ
- Sector Performance: Technology, healthcare, finance sectors
- Volatility Indices: VIX, crypto volatility
- Bond Yields: Treasury yields, corporate bonds
- Commodity Prices: Gold, oil, agricultural commodities
Data Collection Strategies¶
Real-time Data Streaming¶
from openmlcrawler.connectors.finance import RealTimeFinanceStreamer
streamer = RealTimeFinanceStreamer()
# Stream stock prices
streamer.stream_stocks(
symbols=["AAPL", "GOOGL", "MSFT"],
callback=process_stock_data
)
# Stream forex rates
streamer.stream_forex(
pairs=["USD/EUR", "USD/JPY", "EUR/GBP"],
callback=process_forex_data
)
# Stream crypto prices
streamer.stream_crypto(
symbols=["BTC", "ETH", "ADA"],
callback=process_crypto_data
)
Historical Data Collection¶
from openmlcrawler.connectors.finance import HistoricalFinanceCollector
collector = HistoricalFinanceCollector()
# Collect stock history
stock_history = collector.collect_stock_history(
symbols=["AAPL", "GOOGL", "MSFT"],
start_date="2020-01-01",
end_date="2023-12-31",
interval="1d"
)
# Collect economic data
economic_data = collector.collect_economic_data(
indicators=["GDP", "UNRATE", "CPI"],
start_date="2000-01-01",
end_date="2023-12-31"
)
# Collect crypto history
crypto_history = collector.collect_crypto_history(
symbols=["BTC", "ETH"],
start_date="2018-01-01",
end_date="2023-12-31"
)
Batch Processing¶
from openmlcrawler.connectors.finance import BatchFinanceProcessor
processor = BatchFinanceProcessor()
# Process multiple data sources
results = processor.process_batch(
sources=["alpha_vantage", "yahoo", "fred"],
symbols=["AAPL", "GOOGL", "MSFT"],
indicators=["GDP", "UNRATE"],
date_range=("2020-01-01", "2023-12-31")
)
Data Quality and Validation¶
Quality Checks¶
- Data Completeness: Check for missing values and gaps
- Price Validation: Verify price ranges and consistency
- Volume Validation: Check trading volume reasonableness
- Date Consistency: Ensure chronological data ordering
- Cross-Source Validation: Compare data across multiple sources
Validation Framework¶
from openmlcrawler.connectors.finance import FinanceDataValidator
validator = FinanceDataValidator()
# Validate stock data
validation_result = validator.validate_stock_data(
data=stock_data,
checks=[
"price_range_check",
"volume_consistency",
"date_continuity",
"outlier_detection"
]
)
# Validate economic data
eco_validation = validator.validate_economic_data(
data=economic_data,
checks=[
"seasonal_adjustment",
"revision_consistency",
"source_reliability"
]
)
Technical Analysis Integration¶
Built-in Technical Indicators¶
from openmlcrawler.connectors.finance import TechnicalAnalyzer
analyzer = TechnicalAnalyzer()
# Calculate technical indicators
indicators = analyzer.calculate_indicators(
price_data=stock_data,
indicators=[
"SMA_20", "SMA_50", "EMA_12", "EMA_26",
"RSI_14", "MACD", "BBANDS", "STOCH"
]
)
# Generate trading signals
signals = analyzer.generate_signals(
indicators=indicators,
strategies=["moving_average_crossover", "rsi_divergence"]
)
Custom Indicators¶
from openmlcrawler.connectors.finance import CustomIndicatorCalculator
calculator = CustomIndicatorCalculator()
# Calculate custom indicators
custom_indicators = calculator.calculate_custom(
price_data=data,
custom_functions=[
lambda x: x['close'].rolling(10).std(), # Custom volatility
lambda x: (x['high'] - x['low']) / x['close'] # Custom range ratio
]
)
Risk Management¶
Portfolio Risk Analysis¶
from openmlcrawler.connectors.finance import PortfolioRiskAnalyzer
risk_analyzer = PortfolioRiskAnalyzer()
# Calculate portfolio risk metrics
risk_metrics = risk_analyzer.calculate_risk(
portfolio={"AAPL": 0.3, "GOOGL": 0.3, "MSFT": 0.4},
returns_data=historical_returns,
metrics=[
"volatility",
"sharpe_ratio",
"max_drawdown",
"value_at_risk",
"expected_shortfall"
]
)
# Stress testing
stress_results = risk_analyzer.stress_test(
portfolio=portfolio,
scenarios=[
"market_crash_2008",
"covid_2020",
"tech_bubble_2000"
]
)
Configuration Options¶
Global Configuration¶
finance_connectors:
default_sources: ["alpha_vantage", "yahoo"]
data_quality:
enable_validation: true
strict_mode: false
outlier_threshold: 3.0
caching:
enable_cache: true
cache_ttl_minutes: 15
max_cache_size_gb: 50
rate_limiting:
requests_per_minute: 5
burst_limit: 10
risk_management:
enable_risk_analysis: true
confidence_level: 0.95
Source-Specific Settings¶
alpha_vantage:
api_key: "${ALPHA_VANTAGE_API_KEY}"
premium_account: false
output_size: "compact"
datatype: "json"
yahoo_finance:
user_agent: "OpenMLCrawler/1.0"
timeout_seconds: 30
retry_attempts: 3
fred:
api_key: "${FRED_API_KEY}"
realtime_start: "1776-07-04"
realtime_end: "9999-12-31"
coinmarketcap:
api_key: "${COINMARKETCAP_API_KEY}"
pro_api: true
convert: "USD"
auxiliary: "cmc_rank"
Best Practices¶
Performance Optimization¶
- Use Caching: Financial data changes frequently but can be cached
- Batch Requests: Combine multiple symbol requests
- Selective Data: Only request needed data fields
- Rate Limiting: Respect API rate limits to avoid throttling
- Connection Pooling: Reuse connections for multiple requests
Cost Management¶
- API Tier Selection: Choose appropriate API tiers for your needs
- Usage Monitoring: Track API usage and costs
- Data Sampling: Sample high-frequency data for analysis
- Caching Strategy: Implement intelligent caching policies
- Fallback Sources: Use multiple data sources for redundancy
Data Reliability¶
- Multiple Sources: Cross-validate data from multiple sources
- Data Quality Checks: Implement comprehensive validation
- Error Recovery: Handle API failures gracefully
- Monitoring: Monitor data collection health
- Version Control: Track data source versions and changes
Troubleshooting¶
Common Issues¶
API Rate Limiting¶
Invalid Symbols¶
Data Unavailable¶
Error: Historical data not available
Solution: Check date range availability and data source limitations
Authentication Failed¶
See Also¶
- Connectors Overview - Overview of all data connectors
- Data Processing - Processing financial data
- Quality & Privacy - Financial data quality controls
- API Reference - Financial connector API
- Tutorials - Financial data tutorials