Apache Spark

Apache Spark™ is a fast and general engine for large-scale data processing.

개요

마스터/슬레이브 구조
- 중앙 조정자(coordinator) + 분산 작업 노드
- 드라이버 + 익스큐터(executor) ==> 스파크 애플리케이션
Driver Program(SparkContext) - Cluster Manager - Worker Node (Executor:Task)
Executor: Process
Task: A Unit of work that will sent to one executor

Memory Total	VCores Total	Active Nodes
683.59G	100	5

Scheduler Type	Scheduling Resource Type	Minimum Allocation	Maximum Allocation
Fair Scheduler	[MEMORY,CPU]	<memory:4096, vCores:1>	<memory:140000, vCores:20>

Node Labels	Node Address	Mem Avail	VCores Avail
coordinator	dev-01	<memory:4096, vCores:1>	<memory:140000, vCores:20>

스파크 애플리케이션이 사용할 자원의 한계 설정

Note: 대개 고용량 설정의 익스큐터를 적은 개수로 실행하는 것이 더 나은 성능을 보임

Modifying configuration at runtime
It is possible to modify minimum shares, limits, weights, preemption timeouts and queue scheduling policies at runtime by editing the allocation file. The scheduler will reload this file 10-15 seconds after it sees that it was modified.
Moving applications between queues

$ yarn application -movetoqueue appID -queue targetQueueName

참고>
https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/FairScheduler.html