Skip to content

Weather Data Connectors

OpenML Crawler provides comprehensive weather data connectors that can access real-time and historical weather information from multiple weather service providers. These connectors support various weather parameters including temperature, humidity, precipitation, wind speed, and atmospheric conditions.

Supported Providers

Open-Meteo

Free weather API with global coverage and no API key required for basic usage.

Features:

  • Global weather coverage
  • Historical data (up to 10 years)
  • Hourly and daily forecasts
  • Multiple weather parameters
  • No API key required for basic usage

Usage:

from openmlcrawler.connectors.weather import OpenMeteoConnector

connector = OpenMeteoConnector()
data = connector.get_weather(
    latitude=40.7128,
    longitude=-74.0060,
    start_date="2023-01-01",
    end_date="2023-12-31"
)

OpenWeather

Comprehensive weather API with detailed weather information and forecasts.

Features:

  • Current weather conditions
  • 5-day weather forecast
  • Historical weather data
  • Weather maps and alerts
  • Air pollution data
  • Geocoding services

Configuration:

connectors:
  weather:
    openweather:
      api_key: "your_openweather_api_key"
      units: "metric"  # imperial, metric, standard
      language: "en"

Usage:

from openmlcrawler.connectors.weather import OpenWeatherConnector

connector = OpenWeatherConnector(api_key="your_key")
current = connector.get_current_weather("New York")
forecast = connector.get_forecast("New York", days=5)

NOAA (National Oceanic and Atmospheric Administration)

Official US weather data from the National Weather Service.

Features:

  • Official US weather stations
  • Detailed meteorological data
  • Weather alerts and warnings
  • Radar and satellite imagery
  • Marine and aviation weather

Usage:

from openmlcrawler.connectors.weather import NOAAConnector

connector = NOAAConnector()
stations = connector.get_stations(state="NY")
data = connector.get_station_data(station_id="12345", start_date="2023-01-01")

Weather Underground

Community-driven weather network with personal weather stations.

Features:

  • Personal weather station network
  • Hyper-local weather data
  • Weather history and trends
  • Weather alerts
  • API for custom integrations

Usage:

from openmlcrawler.connectors.weather import WeatherUndergroundConnector

connector = WeatherUndergroundConnector(api_key="your_key")
stations = connector.get_pws_stations(lat=40.7128, lon=-74.0060, radius=10)
data = connector.get_pws_history(station_id="KNYCENTI123")

Data Parameters

Common Weather Parameters

Parameter Description Units
temperature Air temperature °C, °F
humidity Relative humidity %
pressure Atmospheric pressure hPa, inHg
wind_speed Wind speed m/s, mph, knots
wind_direction Wind direction degrees
precipitation Precipitation amount mm, inches
visibility Visibility distance km, miles
uv_index UV radiation index 0-11 scale
cloud_cover Cloud coverage %

Advanced Parameters

  • Dew Point: Temperature at which air becomes saturated
  • Heat Index: Feels-like temperature accounting for humidity
  • Wind Chill: Feels-like temperature accounting for wind
  • Solar Radiation: Solar energy reaching the surface
  • Soil Temperature: Ground temperature at various depths
  • Lightning Data: Lightning strike frequency and intensity

Data Collection Strategies

Real-time Monitoring

from openmlcrawler.connectors.weather import WeatherMonitor

monitor = WeatherMonitor(
    locations=["New York", "London", "Tokyo"],
    interval_minutes=15,
    providers=["openweather", "openmeteo"]
)

# Start continuous monitoring
monitor.start_monitoring()

# Get latest data
latest_data = monitor.get_latest_data()

Historical Data Collection

from openmlcrawler.connectors.weather import HistoricalWeatherCollector

collector = HistoricalWeatherCollector()
data = collector.collect_historical_data(
    location="New York",
    start_date="2020-01-01",
    end_date="2023-12-31",
    parameters=["temperature", "humidity", "precipitation"]
)

Batch Processing

from openmlcrawler.connectors.weather import BatchWeatherProcessor

processor = BatchWeatherProcessor()
results = processor.process_locations_batch(
    locations=["NYC", "LAX", "ORD", "MIA"],
    date_range=("2023-01-01", "2023-12-31"),
    output_format="parquet"
)

Data Quality and Validation

Quality Checks

  1. Range Validation: Check if values are within physically possible ranges
  2. Consistency Checks: Verify related parameters are consistent
  3. Temporal Continuity: Ensure data continuity over time
  4. Spatial Coherence: Validate data consistency across nearby locations

Error Handling

from openmlcrawler.connectors.weather import WeatherDataError

try:
    data = connector.get_weather_data(location="Invalid City")
except WeatherDataError as e:
    if e.error_type == "LOCATION_NOT_FOUND":
        print(f"Location not found: {e.location}")
    elif e.error_type == "API_LIMIT_EXCEEDED":
        print(f"API limit exceeded. Retry after: {e.retry_after}")
    elif e.error_type == "DATA_UNAVAILABLE":
        print(f"Weather data unavailable for: {e.location}")

Integration with ML Pipelines

Feature Engineering

from openmlcrawler.connectors.weather import WeatherFeatureEngineer

engineer = WeatherFeatureEngineer()

# Create weather-based features
features = engineer.create_features(
    weather_data=data,
    feature_types=[
        "temperature_trends",
        "humidity_patterns",
        "precipitation_forecasts",
        "wind_conditions"
    ]
)

Time Series Analysis

from openmlcrawler.connectors.weather import WeatherTimeSeriesAnalyzer

analyzer = WeatherTimeSeriesAnalyzer()
analysis = analyzer.analyze_patterns(
    data=data,
    analysis_types=[
        "seasonal_decomposition",
        "trend_analysis",
        "anomaly_detection",
        "forecasting"
    ]
)

Configuration Options

Global Configuration

weather_connectors:
  default_provider: "openmeteo"
  fallback_providers: ["openweather", "noaa"]
  cache_enabled: true
  cache_ttl_seconds: 3600
  rate_limiting:
    requests_per_minute: 60
    burst_limit: 10
  data_quality:
    enable_validation: true
    strict_mode: false

Provider-Specific Settings

openweather:
  api_key: "${OPENWEATHER_API_KEY}"
  units: "metric"
  language: "en"
  enable_alerts: true

openmeteo:
  enable_hourly: true
  enable_daily: true
  historical_days: 3650
  forecast_days: 16

noaa:
  enable_alerts: true
  include_marine: false
  include_aviation: false

Best Practices

Performance Optimization

  1. Use Caching: Cache frequently accessed weather data
  2. Batch Requests: Combine multiple location requests
  3. Selective Parameters: Only request needed weather parameters
  4. Appropriate Intervals: Balance data freshness with API limits

Cost Management

  1. Provider Selection: Choose free tiers for development
  2. Usage Monitoring: Track API usage and costs
  3. Fallback Strategy: Use multiple providers for redundancy
  4. Data Archiving: Archive historical data to reduce API calls

Data Reliability

  1. Multiple Sources: Use multiple weather providers for validation
  2. Data Validation: Implement comprehensive data quality checks
  3. Error Recovery: Handle API failures gracefully
  4. Monitoring: Monitor data collection health and quality

Troubleshooting

Common Issues

API Key Issues

Error: Invalid API key
Solution: Verify API key is correct and has proper permissions

Rate Limiting

Error: API rate limit exceeded
Solution: Implement exponential backoff and reduce request frequency

Location Not Found

Error: Location not found
Solution: Use proper location format (city name, coordinates, or station ID)

Data Unavailable

Error: Weather data unavailable
Solution: Check if location has weather stations or use alternative providers

See Also