Risk Factor Identification using Truck Fleet Data

  • This usecase is to analyse various parameters of truck fleet
  • Each truck has been equipped to log location and event data
  • These events are streamed back to a datacenter where the data will be processed
  • The company wants to use this data to understand the risk

Data

  • Collected geo-location and truck data has been provided
  • Size of the truck data is small which can be stored in RDBMS - which will be imported into Sqoop
  • Geo-location data will be stored in HDFS

Architecture

Steps

  • Load the captured sensor data into Hadoop (HDFS)
  • Load truck data from RDBMS to HDFS/Hive
  • Run Hive, Pig scripts that compute truck mileage and driver risk factor
  • Access the refined sensor data with Microsoft Excel
  • Visualize the sensor data using Excel Power View / Pivot Table /Graphs

Loading data into Hive

Analysing Data

The business objective is to better understand the risk the company is under from fatigue of drivers, over-used trucks, and the impact of various trucking events on risk

Check the code for the project here: TruckRisk