A data pipeline is a sequence of steps that collect, process, and move data between sources for storage, analytics, machine learning, or other uses. For example, data pipelines are often used to send data from applications to storage devices like data warehouses or data lakes. Data pipelines are also frequently used to pull data from storage and transform it into a format that is useful for analytics.
Data pipelines generally consist of three overall steps: extraction, transformation, and loading. These steps can be executed in different orders, such as ETL or ELT. In either case, pipelines are used to extra and transform data to achieve business goals. The series of transformations required to execute an effective pipeline can be very complex. For that reason, specialized software is often used to aid and automate the data pipelining process.