Weather Data Connectors¶
OpenML Crawler provides comprehensive weather data connectors that can access real-time and historical weather information from multiple weather service providers. These connectors support various weather parameters including temperature, humidity, precipitation, wind speed, and atmospheric conditions.
Supported Providers¶
Open-Meteo¶
Free weather API with global coverage and no API key required for basic usage.
Features:
- Global weather coverage
- Historical data (up to 10 years)
- Hourly and daily forecasts
- Multiple weather parameters
- No API key required for basic usage
Usage:
from openmlcrawler.connectors.weather import OpenMeteoConnector
connector = OpenMeteoConnector()
data = connector.get_weather(
latitude=40.7128,
longitude=-74.0060,
start_date="2023-01-01",
end_date="2023-12-31"
)
OpenWeather¶
Comprehensive weather API with detailed weather information and forecasts.
Features:
- Current weather conditions
- 5-day weather forecast
- Historical weather data
- Weather maps and alerts
- Air pollution data
- Geocoding services
Configuration:
connectors:
weather:
openweather:
api_key: "your_openweather_api_key"
units: "metric" # imperial, metric, standard
language: "en"
Usage:
from openmlcrawler.connectors.weather import OpenWeatherConnector
connector = OpenWeatherConnector(api_key="your_key")
current = connector.get_current_weather("New York")
forecast = connector.get_forecast("New York", days=5)
NOAA (National Oceanic and Atmospheric Administration)¶
Official US weather data from the National Weather Service.
Features:
- Official US weather stations
- Detailed meteorological data
- Weather alerts and warnings
- Radar and satellite imagery
- Marine and aviation weather
Usage:
from openmlcrawler.connectors.weather import NOAAConnector
connector = NOAAConnector()
stations = connector.get_stations(state="NY")
data = connector.get_station_data(station_id="12345", start_date="2023-01-01")
Weather Underground¶
Community-driven weather network with personal weather stations.
Features:
- Personal weather station network
- Hyper-local weather data
- Weather history and trends
- Weather alerts
- API for custom integrations
Usage:
from openmlcrawler.connectors.weather import WeatherUndergroundConnector
connector = WeatherUndergroundConnector(api_key="your_key")
stations = connector.get_pws_stations(lat=40.7128, lon=-74.0060, radius=10)
data = connector.get_pws_history(station_id="KNYCENTI123")
Data Parameters¶
Common Weather Parameters¶
Parameter | Description | Units |
---|---|---|
temperature | Air temperature | °C, °F |
humidity | Relative humidity | % |
pressure | Atmospheric pressure | hPa, inHg |
wind_speed | Wind speed | m/s, mph, knots |
wind_direction | Wind direction | degrees |
precipitation | Precipitation amount | mm, inches |
visibility | Visibility distance | km, miles |
uv_index | UV radiation index | 0-11 scale |
cloud_cover | Cloud coverage | % |
Advanced Parameters¶
- Dew Point: Temperature at which air becomes saturated
- Heat Index: Feels-like temperature accounting for humidity
- Wind Chill: Feels-like temperature accounting for wind
- Solar Radiation: Solar energy reaching the surface
- Soil Temperature: Ground temperature at various depths
- Lightning Data: Lightning strike frequency and intensity
Data Collection Strategies¶
Real-time Monitoring¶
from openmlcrawler.connectors.weather import WeatherMonitor
monitor = WeatherMonitor(
locations=["New York", "London", "Tokyo"],
interval_minutes=15,
providers=["openweather", "openmeteo"]
)
# Start continuous monitoring
monitor.start_monitoring()
# Get latest data
latest_data = monitor.get_latest_data()
Historical Data Collection¶
from openmlcrawler.connectors.weather import HistoricalWeatherCollector
collector = HistoricalWeatherCollector()
data = collector.collect_historical_data(
location="New York",
start_date="2020-01-01",
end_date="2023-12-31",
parameters=["temperature", "humidity", "precipitation"]
)
Batch Processing¶
from openmlcrawler.connectors.weather import BatchWeatherProcessor
processor = BatchWeatherProcessor()
results = processor.process_locations_batch(
locations=["NYC", "LAX", "ORD", "MIA"],
date_range=("2023-01-01", "2023-12-31"),
output_format="parquet"
)
Data Quality and Validation¶
Quality Checks¶
- Range Validation: Check if values are within physically possible ranges
- Consistency Checks: Verify related parameters are consistent
- Temporal Continuity: Ensure data continuity over time
- Spatial Coherence: Validate data consistency across nearby locations
Error Handling¶
from openmlcrawler.connectors.weather import WeatherDataError
try:
data = connector.get_weather_data(location="Invalid City")
except WeatherDataError as e:
if e.error_type == "LOCATION_NOT_FOUND":
print(f"Location not found: {e.location}")
elif e.error_type == "API_LIMIT_EXCEEDED":
print(f"API limit exceeded. Retry after: {e.retry_after}")
elif e.error_type == "DATA_UNAVAILABLE":
print(f"Weather data unavailable for: {e.location}")
Integration with ML Pipelines¶
Feature Engineering¶
from openmlcrawler.connectors.weather import WeatherFeatureEngineer
engineer = WeatherFeatureEngineer()
# Create weather-based features
features = engineer.create_features(
weather_data=data,
feature_types=[
"temperature_trends",
"humidity_patterns",
"precipitation_forecasts",
"wind_conditions"
]
)
Time Series Analysis¶
from openmlcrawler.connectors.weather import WeatherTimeSeriesAnalyzer
analyzer = WeatherTimeSeriesAnalyzer()
analysis = analyzer.analyze_patterns(
data=data,
analysis_types=[
"seasonal_decomposition",
"trend_analysis",
"anomaly_detection",
"forecasting"
]
)
Configuration Options¶
Global Configuration¶
weather_connectors:
default_provider: "openmeteo"
fallback_providers: ["openweather", "noaa"]
cache_enabled: true
cache_ttl_seconds: 3600
rate_limiting:
requests_per_minute: 60
burst_limit: 10
data_quality:
enable_validation: true
strict_mode: false
Provider-Specific Settings¶
openweather:
api_key: "${OPENWEATHER_API_KEY}"
units: "metric"
language: "en"
enable_alerts: true
openmeteo:
enable_hourly: true
enable_daily: true
historical_days: 3650
forecast_days: 16
noaa:
enable_alerts: true
include_marine: false
include_aviation: false
Best Practices¶
Performance Optimization¶
- Use Caching: Cache frequently accessed weather data
- Batch Requests: Combine multiple location requests
- Selective Parameters: Only request needed weather parameters
- Appropriate Intervals: Balance data freshness with API limits
Cost Management¶
- Provider Selection: Choose free tiers for development
- Usage Monitoring: Track API usage and costs
- Fallback Strategy: Use multiple providers for redundancy
- Data Archiving: Archive historical data to reduce API calls
Data Reliability¶
- Multiple Sources: Use multiple weather providers for validation
- Data Validation: Implement comprehensive data quality checks
- Error Recovery: Handle API failures gracefully
- Monitoring: Monitor data collection health and quality
Troubleshooting¶
Common Issues¶
API Key Issues¶
Rate Limiting¶
Location Not Found¶
Error: Location not found
Solution: Use proper location format (city name, coordinates, or station ID)
Data Unavailable¶
Error: Weather data unavailable
Solution: Check if location has weather stations or use alternative providers
See Also¶
- Connectors Overview - Overview of all data connectors
- Data Processing - Processing weather data
- Quality & Privacy - Weather data quality controls
- API Reference - Weather connector API
- Tutorials - Weather data tutorials