| xj | b04a402 | 2021-11-25 15:01:52 +0800 | [diff] [blame] | 1 | 				RDMA Controller | 
 | 2 | 				---------------- | 
 | 3 |  | 
 | 4 | Contents | 
 | 5 | -------- | 
 | 6 |  | 
 | 7 | 1. Overview | 
 | 8 |   1-1. What is RDMA controller? | 
 | 9 |   1-2. Why RDMA controller needed? | 
 | 10 |   1-3. How is RDMA controller implemented? | 
 | 11 | 2. Usage Examples | 
 | 12 |  | 
 | 13 | 1. Overview | 
 | 14 |  | 
 | 15 | 1-1. What is RDMA controller? | 
 | 16 | ----------------------------- | 
 | 17 |  | 
 | 18 | RDMA controller allows user to limit RDMA/IB specific resources that a given | 
 | 19 | set of processes can use. These processes are grouped using RDMA controller. | 
 | 20 |  | 
 | 21 | RDMA controller defines two resources which can be limited for processes of a | 
 | 22 | cgroup. | 
 | 23 |  | 
 | 24 | 1-2. Why RDMA controller needed? | 
 | 25 | -------------------------------- | 
 | 26 |  | 
 | 27 | Currently user space applications can easily take away all the rdma verb | 
 | 28 | specific resources such as AH, CQ, QP, MR etc. Due to which other applications | 
 | 29 | in other cgroup or kernel space ULPs may not even get chance to allocate any | 
 | 30 | rdma resources. This can leads to service unavailability. | 
 | 31 |  | 
 | 32 | Therefore RDMA controller is needed through which resource consumption | 
 | 33 | of processes can be limited. Through this controller different rdma | 
 | 34 | resources can be accounted. | 
 | 35 |  | 
 | 36 | 1-3. How is RDMA controller implemented? | 
 | 37 | ---------------------------------------- | 
 | 38 |  | 
 | 39 | RDMA cgroup allows limit configuration of resources. Rdma cgroup maintains | 
 | 40 | resource accounting per cgroup, per device using resource pool structure. | 
 | 41 | Each such resource pool is limited up to 64 resources in given resource pool | 
 | 42 | by rdma cgroup, which can be extended later if required. | 
 | 43 |  | 
 | 44 | This resource pool object is linked to the cgroup css. Typically there | 
 | 45 | are 0 to 4 resource pool instances per cgroup, per device in most use cases. | 
 | 46 | But nothing limits to have it more. At present hundreds of RDMA devices per | 
 | 47 | single cgroup may not be handled optimally, however there is no | 
 | 48 | known use case or requirement for such configuration either. | 
 | 49 |  | 
 | 50 | Since RDMA resources can be allocated from any process and can be freed by any | 
 | 51 | of the child processes which shares the address space, rdma resources are | 
 | 52 | always owned by the creator cgroup css. This allows process migration from one | 
 | 53 | to other cgroup without major complexity of transferring resource ownership; | 
 | 54 | because such ownership is not really present due to shared nature of | 
 | 55 | rdma resources. Linking resources around css also ensures that cgroups can be | 
 | 56 | deleted after processes migrated. This allow progress migration as well with | 
 | 57 | active resources, even though that is not a primary use case. | 
 | 58 |  | 
 | 59 | Whenever RDMA resource charging occurs, owner rdma cgroup is returned to | 
 | 60 | the caller. Same rdma cgroup should be passed while uncharging the resource. | 
 | 61 | This also allows process migrated with active RDMA resource to charge | 
 | 62 | to new owner cgroup for new resource. It also allows to uncharge resource of | 
 | 63 | a process from previously charged cgroup which is migrated to new cgroup, | 
 | 64 | even though that is not a primary use case. | 
 | 65 |  | 
 | 66 | Resource pool object is created in following situations. | 
 | 67 | (a) User sets the limit and no previous resource pool exist for the device | 
 | 68 | of interest for the cgroup. | 
 | 69 | (b) No resource limits were configured, but IB/RDMA stack tries to | 
 | 70 | charge the resource. So that it correctly uncharge them when applications are | 
 | 71 | running without limits and later on when limits are enforced during uncharging, | 
 | 72 | otherwise usage count will drop to negative. | 
 | 73 |  | 
 | 74 | Resource pool is destroyed if all the resource limits are set to max and | 
 | 75 | it is the last resource getting deallocated. | 
 | 76 |  | 
 | 77 | User should set all the limit to max value if it intents to remove/unconfigure | 
 | 78 | the resource pool for a particular device. | 
 | 79 |  | 
 | 80 | IB stack honors limits enforced by the rdma controller. When application | 
 | 81 | query about maximum resource limits of IB device, it returns minimum of | 
 | 82 | what is configured by user for a given cgroup and what is supported by | 
 | 83 | IB device. | 
 | 84 |  | 
 | 85 | Following resources can be accounted by rdma controller. | 
 | 86 |   hca_handle	Maximum number of HCA Handles | 
 | 87 |   hca_object 	Maximum number of HCA Objects | 
 | 88 |  | 
 | 89 | 2. Usage Examples | 
 | 90 | ----------------- | 
 | 91 |  | 
 | 92 | (a) Configure resource limit: | 
 | 93 | echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max | 
 | 94 | echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max | 
 | 95 |  | 
 | 96 | (b) Query resource limit: | 
 | 97 | cat /sys/fs/cgroup/rdma/2/rdma.max | 
 | 98 | #Output: | 
 | 99 | mlx4_0 hca_handle=2 hca_object=2000 | 
 | 100 | ocrdma1 hca_handle=3 hca_object=max | 
 | 101 |  | 
 | 102 | (c) Query current usage: | 
 | 103 | cat /sys/fs/cgroup/rdma/2/rdma.current | 
 | 104 | #Output: | 
 | 105 | mlx4_0 hca_handle=1 hca_object=20 | 
 | 106 | ocrdma1 hca_handle=1 hca_object=23 | 
 | 107 |  | 
 | 108 | (d) Delete resource limit: | 
 | 109 | echo echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max |