| xj | b04a402 | 2021-11-25 15:01:52 +0800 | [diff] [blame] | 1 | Runtime Power Management Framework for I/O Devices | 
|  | 2 |  | 
|  | 3 | (C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. | 
|  | 4 | (C) 2010 Alan Stern <stern@rowland.harvard.edu> | 
|  | 5 | (C) 2014 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com> | 
|  | 6 |  | 
|  | 7 | 1. Introduction | 
|  | 8 |  | 
|  | 9 | Support for runtime power management (runtime PM) of I/O devices is provided | 
|  | 10 | at the power management core (PM core) level by means of: | 
|  | 11 |  | 
|  | 12 | * The power management workqueue pm_wq in which bus types and device drivers can | 
|  | 13 | put their PM-related work items.  It is strongly recommended that pm_wq be | 
|  | 14 | used for queuing all work items related to runtime PM, because this allows | 
|  | 15 | them to be synchronized with system-wide power transitions (suspend to RAM, | 
|  | 16 | hibernation and resume from system sleep states).  pm_wq is declared in | 
|  | 17 | include/linux/pm_runtime.h and defined in kernel/power/main.c. | 
|  | 18 |  | 
|  | 19 | * A number of runtime PM fields in the 'power' member of 'struct device' (which | 
|  | 20 | is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can | 
|  | 21 | be used for synchronizing runtime PM operations with one another. | 
|  | 22 |  | 
|  | 23 | * Three device runtime PM callbacks in 'struct dev_pm_ops' (defined in | 
|  | 24 | include/linux/pm.h). | 
|  | 25 |  | 
|  | 26 | * A set of helper functions defined in drivers/base/power/runtime.c that can be | 
|  | 27 | used for carrying out runtime PM operations in such a way that the | 
|  | 28 | synchronization between them is taken care of by the PM core.  Bus types and | 
|  | 29 | device drivers are encouraged to use these functions. | 
|  | 30 |  | 
|  | 31 | The runtime PM callbacks present in 'struct dev_pm_ops', the device runtime PM | 
|  | 32 | fields of 'struct dev_pm_info' and the core helper functions provided for | 
|  | 33 | runtime PM are described below. | 
|  | 34 |  | 
|  | 35 | 2. Device Runtime PM Callbacks | 
|  | 36 |  | 
|  | 37 | There are three device runtime PM callbacks defined in 'struct dev_pm_ops': | 
|  | 38 |  | 
|  | 39 | struct dev_pm_ops { | 
|  | 40 | ... | 
|  | 41 | int (*runtime_suspend)(struct device *dev); | 
|  | 42 | int (*runtime_resume)(struct device *dev); | 
|  | 43 | int (*runtime_idle)(struct device *dev); | 
|  | 44 | ... | 
|  | 45 | }; | 
|  | 46 |  | 
|  | 47 | The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks | 
|  | 48 | are executed by the PM core for the device's subsystem that may be either of | 
|  | 49 | the following: | 
|  | 50 |  | 
|  | 51 | 1. PM domain of the device, if the device's PM domain object, dev->pm_domain, | 
|  | 52 | is present. | 
|  | 53 |  | 
|  | 54 | 2. Device type of the device, if both dev->type and dev->type->pm are present. | 
|  | 55 |  | 
|  | 56 | 3. Device class of the device, if both dev->class and dev->class->pm are | 
|  | 57 | present. | 
|  | 58 |  | 
|  | 59 | 4. Bus type of the device, if both dev->bus and dev->bus->pm are present. | 
|  | 60 |  | 
|  | 61 | If the subsystem chosen by applying the above rules doesn't provide the relevant | 
|  | 62 | callback, the PM core will invoke the corresponding driver callback stored in | 
|  | 63 | dev->driver->pm directly (if present). | 
|  | 64 |  | 
|  | 65 | The PM core always checks which callback to use in the order given above, so the | 
|  | 66 | priority order of callbacks from high to low is: PM domain, device type, class | 
|  | 67 | and bus type.  Moreover, the high-priority one will always take precedence over | 
|  | 68 | a low-priority one.  The PM domain, bus type, device type and class callbacks | 
|  | 69 | are referred to as subsystem-level callbacks in what follows. | 
|  | 70 |  | 
|  | 71 | By default, the callbacks are always invoked in process context with interrupts | 
|  | 72 | enabled.  However, the pm_runtime_irq_safe() helper function can be used to tell | 
|  | 73 | the PM core that it is safe to run the ->runtime_suspend(), ->runtime_resume() | 
|  | 74 | and ->runtime_idle() callbacks for the given device in atomic context with | 
|  | 75 | interrupts disabled.  This implies that the callback routines in question must | 
|  | 76 | not block or sleep, but it also means that the synchronous helper functions | 
|  | 77 | listed at the end of Section 4 may be used for that device within an interrupt | 
|  | 78 | handler or generally in an atomic context. | 
|  | 79 |  | 
|  | 80 | The subsystem-level suspend callback, if present, is _entirely_ _responsible_ | 
|  | 81 | for handling the suspend of the device as appropriate, which may, but need not | 
|  | 82 | include executing the device driver's own ->runtime_suspend() callback (from the | 
|  | 83 | PM core's point of view it is not necessary to implement a ->runtime_suspend() | 
|  | 84 | callback in a device driver as long as the subsystem-level suspend callback | 
|  | 85 | knows what to do to handle the device). | 
|  | 86 |  | 
|  | 87 | * Once the subsystem-level suspend callback (or the driver suspend callback, | 
|  | 88 | if invoked directly) has completed successfully for the given device, the PM | 
|  | 89 | core regards the device as suspended, which need not mean that it has been | 
|  | 90 | put into a low power state.  It is supposed to mean, however, that the | 
|  | 91 | device will not process data and will not communicate with the CPU(s) and | 
|  | 92 | RAM until the appropriate resume callback is executed for it.  The runtime | 
|  | 93 | PM status of a device after successful execution of the suspend callback is | 
|  | 94 | 'suspended'. | 
|  | 95 |  | 
|  | 96 | * If the suspend callback returns -EBUSY or -EAGAIN, the device's runtime PM | 
|  | 97 | status remains 'active', which means that the device _must_ be fully | 
|  | 98 | operational afterwards. | 
|  | 99 |  | 
|  | 100 | * If the suspend callback returns an error code different from -EBUSY and | 
|  | 101 | -EAGAIN, the PM core regards this as a fatal error and will refuse to run | 
|  | 102 | the helper functions described in Section 4 for the device until its status | 
|  | 103 | is directly set to  either 'active', or 'suspended' (the PM core provides | 
|  | 104 | special helper functions for this purpose). | 
|  | 105 |  | 
|  | 106 | In particular, if the driver requires remote wakeup capability (i.e. hardware | 
|  | 107 | mechanism allowing the device to request a change of its power state, such as | 
|  | 108 | PCI PME) for proper functioning and device_can_wakeup() returns 'false' for the | 
|  | 109 | device, then ->runtime_suspend() should return -EBUSY.  On the other hand, if | 
|  | 110 | device_can_wakeup() returns 'true' for the device and the device is put into a | 
|  | 111 | low-power state during the execution of the suspend callback, it is expected | 
|  | 112 | that remote wakeup will be enabled for the device.  Generally, remote wakeup | 
|  | 113 | should be enabled for all input devices put into low-power states at run time. | 
|  | 114 |  | 
|  | 115 | The subsystem-level resume callback, if present, is _entirely_ _responsible_ for | 
|  | 116 | handling the resume of the device as appropriate, which may, but need not | 
|  | 117 | include executing the device driver's own ->runtime_resume() callback (from the | 
|  | 118 | PM core's point of view it is not necessary to implement a ->runtime_resume() | 
|  | 119 | callback in a device driver as long as the subsystem-level resume callback knows | 
|  | 120 | what to do to handle the device). | 
|  | 121 |  | 
|  | 122 | * Once the subsystem-level resume callback (or the driver resume callback, if | 
|  | 123 | invoked directly) has completed successfully, the PM core regards the device | 
|  | 124 | as fully operational, which means that the device _must_ be able to complete | 
|  | 125 | I/O operations as needed.  The runtime PM status of the device is then | 
|  | 126 | 'active'. | 
|  | 127 |  | 
|  | 128 | * If the resume callback returns an error code, the PM core regards this as a | 
|  | 129 | fatal error and will refuse to run the helper functions described in Section | 
|  | 130 | 4 for the device, until its status is directly set to either 'active', or | 
|  | 131 | 'suspended' (by means of special helper functions provided by the PM core | 
|  | 132 | for this purpose). | 
|  | 133 |  | 
|  | 134 | The idle callback (a subsystem-level one, if present, or the driver one) is | 
|  | 135 | executed by the PM core whenever the device appears to be idle, which is | 
|  | 136 | indicated to the PM core by two counters, the device's usage counter and the | 
|  | 137 | counter of 'active' children of the device. | 
|  | 138 |  | 
|  | 139 | * If any of these counters is decreased using a helper function provided by | 
|  | 140 | the PM core and it turns out to be equal to zero, the other counter is | 
|  | 141 | checked.  If that counter also is equal to zero, the PM core executes the | 
|  | 142 | idle callback with the device as its argument. | 
|  | 143 |  | 
|  | 144 | The action performed by the idle callback is totally dependent on the subsystem | 
|  | 145 | (or driver) in question, but the expected and recommended action is to check | 
|  | 146 | if the device can be suspended (i.e. if all of the conditions necessary for | 
|  | 147 | suspending the device are satisfied) and to queue up a suspend request for the | 
|  | 148 | device in that case.  If there is no idle callback, or if the callback returns | 
|  | 149 | 0, then the PM core will attempt to carry out a runtime suspend of the device, | 
|  | 150 | also respecting devices configured for autosuspend.  In essence this means a | 
|  | 151 | call to pm_runtime_autosuspend() (do note that drivers needs to update the | 
|  | 152 | device last busy mark, pm_runtime_mark_last_busy(), to control the delay under | 
|  | 153 | this circumstance).  To prevent this (for example, if the callback routine has | 
|  | 154 | started a delayed suspend), the routine must return a non-zero value.  Negative | 
|  | 155 | error return codes are ignored by the PM core. | 
|  | 156 |  | 
|  | 157 | The helper functions provided by the PM core, described in Section 4, guarantee | 
|  | 158 | that the following constraints are met with respect to runtime PM callbacks for | 
|  | 159 | one device: | 
|  | 160 |  | 
|  | 161 | (1) The callbacks are mutually exclusive (e.g. it is forbidden to execute | 
|  | 162 | ->runtime_suspend() in parallel with ->runtime_resume() or with another | 
|  | 163 | instance of ->runtime_suspend() for the same device) with the exception that | 
|  | 164 | ->runtime_suspend() or ->runtime_resume() can be executed in parallel with | 
|  | 165 | ->runtime_idle() (although ->runtime_idle() will not be started while any | 
|  | 166 | of the other callbacks is being executed for the same device). | 
|  | 167 |  | 
|  | 168 | (2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active' | 
|  | 169 | devices (i.e. the PM core will only execute ->runtime_idle() or | 
|  | 170 | ->runtime_suspend() for the devices the runtime PM status of which is | 
|  | 171 | 'active'). | 
|  | 172 |  | 
|  | 173 | (3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device | 
|  | 174 | the usage counter of which is equal to zero _and_ either the counter of | 
|  | 175 | 'active' children of which is equal to zero, or the 'power.ignore_children' | 
|  | 176 | flag of which is set. | 
|  | 177 |  | 
|  | 178 | (4) ->runtime_resume() can only be executed for 'suspended' devices  (i.e. the | 
|  | 179 | PM core will only execute ->runtime_resume() for the devices the runtime | 
|  | 180 | PM status of which is 'suspended'). | 
|  | 181 |  | 
|  | 182 | Additionally, the helper functions provided by the PM core obey the following | 
|  | 183 | rules: | 
|  | 184 |  | 
|  | 185 | * If ->runtime_suspend() is about to be executed or there's a pending request | 
|  | 186 | to execute it, ->runtime_idle() will not be executed for the same device. | 
|  | 187 |  | 
|  | 188 | * A request to execute or to schedule the execution of ->runtime_suspend() | 
|  | 189 | will cancel any pending requests to execute ->runtime_idle() for the same | 
|  | 190 | device. | 
|  | 191 |  | 
|  | 192 | * If ->runtime_resume() is about to be executed or there's a pending request | 
|  | 193 | to execute it, the other callbacks will not be executed for the same device. | 
|  | 194 |  | 
|  | 195 | * A request to execute ->runtime_resume() will cancel any pending or | 
|  | 196 | scheduled requests to execute the other callbacks for the same device, | 
|  | 197 | except for scheduled autosuspends. | 
|  | 198 |  | 
|  | 199 | 3. Runtime PM Device Fields | 
|  | 200 |  | 
|  | 201 | The following device runtime PM fields are present in 'struct dev_pm_info', as | 
|  | 202 | defined in include/linux/pm.h: | 
|  | 203 |  | 
|  | 204 | struct timer_list suspend_timer; | 
|  | 205 | - timer used for scheduling (delayed) suspend and autosuspend requests | 
|  | 206 |  | 
|  | 207 | unsigned long timer_expires; | 
|  | 208 | - timer expiration time, in jiffies (if this is different from zero, the | 
|  | 209 | timer is running and will expire at that time, otherwise the timer is not | 
|  | 210 | running) | 
|  | 211 |  | 
|  | 212 | struct work_struct work; | 
|  | 213 | - work structure used for queuing up requests (i.e. work items in pm_wq) | 
|  | 214 |  | 
|  | 215 | wait_queue_head_t wait_queue; | 
|  | 216 | - wait queue used if any of the helper functions needs to wait for another | 
|  | 217 | one to complete | 
|  | 218 |  | 
|  | 219 | spinlock_t lock; | 
|  | 220 | - lock used for synchronization | 
|  | 221 |  | 
|  | 222 | atomic_t usage_count; | 
|  | 223 | - the usage counter of the device | 
|  | 224 |  | 
|  | 225 | atomic_t child_count; | 
|  | 226 | - the count of 'active' children of the device | 
|  | 227 |  | 
|  | 228 | unsigned int ignore_children; | 
|  | 229 | - if set, the value of child_count is ignored (but still updated) | 
|  | 230 |  | 
|  | 231 | unsigned int disable_depth; | 
|  | 232 | - used for disabling the helper functions (they work normally if this is | 
|  | 233 | equal to zero); the initial value of it is 1 (i.e. runtime PM is | 
|  | 234 | initially disabled for all devices) | 
|  | 235 |  | 
|  | 236 | int runtime_error; | 
|  | 237 | - if set, there was a fatal error (one of the callbacks returned error code | 
|  | 238 | as described in Section 2), so the helper functions will not work until | 
|  | 239 | this flag is cleared; this is the error code returned by the failing | 
|  | 240 | callback | 
|  | 241 |  | 
|  | 242 | unsigned int idle_notification; | 
|  | 243 | - if set, ->runtime_idle() is being executed | 
|  | 244 |  | 
|  | 245 | unsigned int request_pending; | 
|  | 246 | - if set, there's a pending request (i.e. a work item queued up into pm_wq) | 
|  | 247 |  | 
|  | 248 | enum rpm_request request; | 
|  | 249 | - type of request that's pending (valid if request_pending is set) | 
|  | 250 |  | 
|  | 251 | unsigned int deferred_resume; | 
|  | 252 | - set if ->runtime_resume() is about to be run while ->runtime_suspend() is | 
|  | 253 | being executed for that device and it is not practical to wait for the | 
|  | 254 | suspend to complete; means "start a resume as soon as you've suspended" | 
|  | 255 |  | 
|  | 256 | enum rpm_status runtime_status; | 
|  | 257 | - the runtime PM status of the device; this field's initial value is | 
|  | 258 | RPM_SUSPENDED, which means that each device is initially regarded by the | 
|  | 259 | PM core as 'suspended', regardless of its real hardware status | 
|  | 260 |  | 
|  | 261 | unsigned int runtime_auto; | 
|  | 262 | - if set, indicates that the user space has allowed the device driver to | 
|  | 263 | power manage the device at run time via the /sys/devices/.../power/control | 
|  | 264 | interface; it may only be modified with the help of the pm_runtime_allow() | 
|  | 265 | and pm_runtime_forbid() helper functions | 
|  | 266 |  | 
|  | 267 | unsigned int no_callbacks; | 
|  | 268 | - indicates that the device does not use the runtime PM callbacks (see | 
|  | 269 | Section 8); it may be modified only by the pm_runtime_no_callbacks() | 
|  | 270 | helper function | 
|  | 271 |  | 
|  | 272 | unsigned int irq_safe; | 
|  | 273 | - indicates that the ->runtime_suspend() and ->runtime_resume() callbacks | 
|  | 274 | will be invoked with the spinlock held and interrupts disabled | 
|  | 275 |  | 
|  | 276 | unsigned int use_autosuspend; | 
|  | 277 | - indicates that the device's driver supports delayed autosuspend (see | 
|  | 278 | Section 9); it may be modified only by the | 
|  | 279 | pm_runtime{_dont}_use_autosuspend() helper functions | 
|  | 280 |  | 
|  | 281 | unsigned int timer_autosuspends; | 
|  | 282 | - indicates that the PM core should attempt to carry out an autosuspend | 
|  | 283 | when the timer expires rather than a normal suspend | 
|  | 284 |  | 
|  | 285 | int autosuspend_delay; | 
|  | 286 | - the delay time (in milliseconds) to be used for autosuspend | 
|  | 287 |  | 
|  | 288 | unsigned long last_busy; | 
|  | 289 | - the time (in jiffies) when the pm_runtime_mark_last_busy() helper | 
|  | 290 | function was last called for this device; used in calculating inactivity | 
|  | 291 | periods for autosuspend | 
|  | 292 |  | 
|  | 293 | All of the above fields are members of the 'power' member of 'struct device'. | 
|  | 294 |  | 
|  | 295 | 4. Runtime PM Device Helper Functions | 
|  | 296 |  | 
|  | 297 | The following runtime PM helper functions are defined in | 
|  | 298 | drivers/base/power/runtime.c and include/linux/pm_runtime.h: | 
|  | 299 |  | 
|  | 300 | void pm_runtime_init(struct device *dev); | 
|  | 301 | - initialize the device runtime PM fields in 'struct dev_pm_info' | 
|  | 302 |  | 
|  | 303 | void pm_runtime_remove(struct device *dev); | 
|  | 304 | - make sure that the runtime PM of the device will be disabled after | 
|  | 305 | removing the device from device hierarchy | 
|  | 306 |  | 
|  | 307 | int pm_runtime_idle(struct device *dev); | 
|  | 308 | - execute the subsystem-level idle callback for the device; returns an | 
|  | 309 | error code on failure, where -EINPROGRESS means that ->runtime_idle() is | 
|  | 310 | already being executed; if there is no callback or the callback returns 0 | 
|  | 311 | then run pm_runtime_autosuspend(dev) and return its result | 
|  | 312 |  | 
|  | 313 | int pm_runtime_suspend(struct device *dev); | 
|  | 314 | - execute the subsystem-level suspend callback for the device; returns 0 on | 
|  | 315 | success, 1 if the device's runtime PM status was already 'suspended', or | 
|  | 316 | error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt | 
|  | 317 | to suspend the device again in future and -EACCES means that | 
|  | 318 | 'power.disable_depth' is different from 0 | 
|  | 319 |  | 
|  | 320 | int pm_runtime_autosuspend(struct device *dev); | 
|  | 321 | - same as pm_runtime_suspend() except that the autosuspend delay is taken | 
|  | 322 | into account; if pm_runtime_autosuspend_expiration() says the delay has | 
|  | 323 | not yet expired then an autosuspend is scheduled for the appropriate time | 
|  | 324 | and 0 is returned | 
|  | 325 |  | 
|  | 326 | int pm_runtime_resume(struct device *dev); | 
|  | 327 | - execute the subsystem-level resume callback for the device; returns 0 on | 
|  | 328 | success, 1 if the device's runtime PM status was already 'active' or | 
|  | 329 | error code on failure, where -EAGAIN means it may be safe to attempt to | 
|  | 330 | resume the device again in future, but 'power.runtime_error' should be | 
|  | 331 | checked additionally, and -EACCES means that 'power.disable_depth' is | 
|  | 332 | different from 0 | 
|  | 333 |  | 
|  | 334 | int pm_request_idle(struct device *dev); | 
|  | 335 | - submit a request to execute the subsystem-level idle callback for the | 
|  | 336 | device (the request is represented by a work item in pm_wq); returns 0 on | 
|  | 337 | success or error code if the request has not been queued up | 
|  | 338 |  | 
|  | 339 | int pm_request_autosuspend(struct device *dev); | 
|  | 340 | - schedule the execution of the subsystem-level suspend callback for the | 
|  | 341 | device when the autosuspend delay has expired; if the delay has already | 
|  | 342 | expired then the work item is queued up immediately | 
|  | 343 |  | 
|  | 344 | int pm_schedule_suspend(struct device *dev, unsigned int delay); | 
|  | 345 | - schedule the execution of the subsystem-level suspend callback for the | 
|  | 346 | device in future, where 'delay' is the time to wait before queuing up a | 
|  | 347 | suspend work item in pm_wq, in milliseconds (if 'delay' is zero, the work | 
|  | 348 | item is queued up immediately); returns 0 on success, 1 if the device's PM | 
|  | 349 | runtime status was already 'suspended', or error code if the request | 
|  | 350 | hasn't been scheduled (or queued up if 'delay' is 0); if the execution of | 
|  | 351 | ->runtime_suspend() is already scheduled and not yet expired, the new | 
|  | 352 | value of 'delay' will be used as the time to wait | 
|  | 353 |  | 
|  | 354 | int pm_request_resume(struct device *dev); | 
|  | 355 | - submit a request to execute the subsystem-level resume callback for the | 
|  | 356 | device (the request is represented by a work item in pm_wq); returns 0 on | 
|  | 357 | success, 1 if the device's runtime PM status was already 'active', or | 
|  | 358 | error code if the request hasn't been queued up | 
|  | 359 |  | 
|  | 360 | void pm_runtime_get_noresume(struct device *dev); | 
|  | 361 | - increment the device's usage counter | 
|  | 362 |  | 
|  | 363 | int pm_runtime_get(struct device *dev); | 
|  | 364 | - increment the device's usage counter, run pm_request_resume(dev) and | 
|  | 365 | return its result | 
|  | 366 |  | 
|  | 367 | int pm_runtime_get_sync(struct device *dev); | 
|  | 368 | - increment the device's usage counter, run pm_runtime_resume(dev) and | 
|  | 369 | return its result | 
|  | 370 |  | 
|  | 371 | int pm_runtime_get_if_in_use(struct device *dev); | 
|  | 372 | - return -EINVAL if 'power.disable_depth' is nonzero; otherwise, if the | 
|  | 373 | runtime PM status is RPM_ACTIVE and the runtime PM usage counter is | 
|  | 374 | nonzero, increment the counter and return 1; otherwise return 0 without | 
|  | 375 | changing the counter | 
|  | 376 |  | 
|  | 377 | void pm_runtime_put_noidle(struct device *dev); | 
|  | 378 | - decrement the device's usage counter | 
|  | 379 |  | 
|  | 380 | int pm_runtime_put(struct device *dev); | 
|  | 381 | - decrement the device's usage counter; if the result is 0 then run | 
|  | 382 | pm_request_idle(dev) and return its result | 
|  | 383 |  | 
|  | 384 | int pm_runtime_put_autosuspend(struct device *dev); | 
|  | 385 | - decrement the device's usage counter; if the result is 0 then run | 
|  | 386 | pm_request_autosuspend(dev) and return its result | 
|  | 387 |  | 
|  | 388 | int pm_runtime_put_sync(struct device *dev); | 
|  | 389 | - decrement the device's usage counter; if the result is 0 then run | 
|  | 390 | pm_runtime_idle(dev) and return its result | 
|  | 391 |  | 
|  | 392 | int pm_runtime_put_sync_suspend(struct device *dev); | 
|  | 393 | - decrement the device's usage counter; if the result is 0 then run | 
|  | 394 | pm_runtime_suspend(dev) and return its result | 
|  | 395 |  | 
|  | 396 | int pm_runtime_put_sync_autosuspend(struct device *dev); | 
|  | 397 | - decrement the device's usage counter; if the result is 0 then run | 
|  | 398 | pm_runtime_autosuspend(dev) and return its result | 
|  | 399 |  | 
|  | 400 | void pm_runtime_enable(struct device *dev); | 
|  | 401 | - decrement the device's 'power.disable_depth' field; if that field is equal | 
|  | 402 | to zero, the runtime PM helper functions can execute subsystem-level | 
|  | 403 | callbacks described in Section 2 for the device | 
|  | 404 |  | 
|  | 405 | int pm_runtime_disable(struct device *dev); | 
|  | 406 | - increment the device's 'power.disable_depth' field (if the value of that | 
|  | 407 | field was previously zero, this prevents subsystem-level runtime PM | 
|  | 408 | callbacks from being run for the device), make sure that all of the | 
|  | 409 | pending runtime PM operations on the device are either completed or | 
|  | 410 | canceled; returns 1 if there was a resume request pending and it was | 
|  | 411 | necessary to execute the subsystem-level resume callback for the device | 
|  | 412 | to satisfy that request, otherwise 0 is returned | 
|  | 413 |  | 
|  | 414 | int pm_runtime_barrier(struct device *dev); | 
|  | 415 | - check if there's a resume request pending for the device and resume it | 
|  | 416 | (synchronously) in that case, cancel any other pending runtime PM requests | 
|  | 417 | regarding it and wait for all runtime PM operations on it in progress to | 
|  | 418 | complete; returns 1 if there was a resume request pending and it was | 
|  | 419 | necessary to execute the subsystem-level resume callback for the device to | 
|  | 420 | satisfy that request, otherwise 0 is returned | 
|  | 421 |  | 
|  | 422 | void pm_suspend_ignore_children(struct device *dev, bool enable); | 
|  | 423 | - set/unset the power.ignore_children flag of the device | 
|  | 424 |  | 
|  | 425 | int pm_runtime_set_active(struct device *dev); | 
|  | 426 | - clear the device's 'power.runtime_error' flag, set the device's runtime | 
|  | 427 | PM status to 'active' and update its parent's counter of 'active' | 
|  | 428 | children as appropriate (it is only valid to use this function if | 
|  | 429 | 'power.runtime_error' is set or 'power.disable_depth' is greater than | 
|  | 430 | zero); it will fail and return error code if the device has a parent | 
|  | 431 | which is not active and the 'power.ignore_children' flag of which is unset | 
|  | 432 |  | 
|  | 433 | void pm_runtime_set_suspended(struct device *dev); | 
|  | 434 | - clear the device's 'power.runtime_error' flag, set the device's runtime | 
|  | 435 | PM status to 'suspended' and update its parent's counter of 'active' | 
|  | 436 | children as appropriate (it is only valid to use this function if | 
|  | 437 | 'power.runtime_error' is set or 'power.disable_depth' is greater than | 
|  | 438 | zero) | 
|  | 439 |  | 
|  | 440 | bool pm_runtime_active(struct device *dev); | 
|  | 441 | - return true if the device's runtime PM status is 'active' or its | 
|  | 442 | 'power.disable_depth' field is not equal to zero, or false otherwise | 
|  | 443 |  | 
|  | 444 | bool pm_runtime_suspended(struct device *dev); | 
|  | 445 | - return true if the device's runtime PM status is 'suspended' and its | 
|  | 446 | 'power.disable_depth' field is equal to zero, or false otherwise | 
|  | 447 |  | 
|  | 448 | bool pm_runtime_status_suspended(struct device *dev); | 
|  | 449 | - return true if the device's runtime PM status is 'suspended' | 
|  | 450 |  | 
|  | 451 | void pm_runtime_allow(struct device *dev); | 
|  | 452 | - set the power.runtime_auto flag for the device and decrease its usage | 
|  | 453 | counter (used by the /sys/devices/.../power/control interface to | 
|  | 454 | effectively allow the device to be power managed at run time) | 
|  | 455 |  | 
|  | 456 | void pm_runtime_forbid(struct device *dev); | 
|  | 457 | - unset the power.runtime_auto flag for the device and increase its usage | 
|  | 458 | counter (used by the /sys/devices/.../power/control interface to | 
|  | 459 | effectively prevent the device from being power managed at run time) | 
|  | 460 |  | 
|  | 461 | void pm_runtime_no_callbacks(struct device *dev); | 
|  | 462 | - set the power.no_callbacks flag for the device and remove the runtime | 
|  | 463 | PM attributes from /sys/devices/.../power (or prevent them from being | 
|  | 464 | added when the device is registered) | 
|  | 465 |  | 
|  | 466 | void pm_runtime_irq_safe(struct device *dev); | 
|  | 467 | - set the power.irq_safe flag for the device, causing the runtime-PM | 
|  | 468 | callbacks to be invoked with interrupts off | 
|  | 469 |  | 
|  | 470 | bool pm_runtime_is_irq_safe(struct device *dev); | 
|  | 471 | - return true if power.irq_safe flag was set for the device, causing | 
|  | 472 | the runtime-PM callbacks to be invoked with interrupts off | 
|  | 473 |  | 
|  | 474 | void pm_runtime_mark_last_busy(struct device *dev); | 
|  | 475 | - set the power.last_busy field to the current time | 
|  | 476 |  | 
|  | 477 | void pm_runtime_use_autosuspend(struct device *dev); | 
|  | 478 | - set the power.use_autosuspend flag, enabling autosuspend delays; call | 
|  | 479 | pm_runtime_get_sync if the flag was previously cleared and | 
|  | 480 | power.autosuspend_delay is negative | 
|  | 481 |  | 
|  | 482 | void pm_runtime_dont_use_autosuspend(struct device *dev); | 
|  | 483 | - clear the power.use_autosuspend flag, disabling autosuspend delays; | 
|  | 484 | decrement the device's usage counter if the flag was previously set and | 
|  | 485 | power.autosuspend_delay is negative; call pm_runtime_idle | 
|  | 486 |  | 
|  | 487 | void pm_runtime_set_autosuspend_delay(struct device *dev, int delay); | 
|  | 488 | - set the power.autosuspend_delay value to 'delay' (expressed in | 
|  | 489 | milliseconds); if 'delay' is negative then runtime suspends are | 
|  | 490 | prevented; if power.use_autosuspend is set, pm_runtime_get_sync may be | 
|  | 491 | called or the device's usage counter may be decremented and | 
|  | 492 | pm_runtime_idle called depending on if power.autosuspend_delay is | 
|  | 493 | changed to or from a negative value; if power.use_autosuspend is clear, | 
|  | 494 | pm_runtime_idle is called | 
|  | 495 |  | 
|  | 496 | unsigned long pm_runtime_autosuspend_expiration(struct device *dev); | 
|  | 497 | - calculate the time when the current autosuspend delay period will expire, | 
|  | 498 | based on power.last_busy and power.autosuspend_delay; if the delay time | 
|  | 499 | is 1000 ms or larger then the expiration time is rounded up to the | 
|  | 500 | nearest second; returns 0 if the delay period has already expired or | 
|  | 501 | power.use_autosuspend isn't set, otherwise returns the expiration time | 
|  | 502 | in jiffies | 
|  | 503 |  | 
|  | 504 | It is safe to execute the following helper functions from interrupt context: | 
|  | 505 |  | 
|  | 506 | pm_request_idle() | 
|  | 507 | pm_request_autosuspend() | 
|  | 508 | pm_schedule_suspend() | 
|  | 509 | pm_request_resume() | 
|  | 510 | pm_runtime_get_noresume() | 
|  | 511 | pm_runtime_get() | 
|  | 512 | pm_runtime_put_noidle() | 
|  | 513 | pm_runtime_put() | 
|  | 514 | pm_runtime_put_autosuspend() | 
|  | 515 | pm_runtime_enable() | 
|  | 516 | pm_suspend_ignore_children() | 
|  | 517 | pm_runtime_set_active() | 
|  | 518 | pm_runtime_set_suspended() | 
|  | 519 | pm_runtime_suspended() | 
|  | 520 | pm_runtime_mark_last_busy() | 
|  | 521 | pm_runtime_autosuspend_expiration() | 
|  | 522 |  | 
|  | 523 | If pm_runtime_irq_safe() has been called for a device then the following helper | 
|  | 524 | functions may also be used in interrupt context: | 
|  | 525 |  | 
|  | 526 | pm_runtime_idle() | 
|  | 527 | pm_runtime_suspend() | 
|  | 528 | pm_runtime_autosuspend() | 
|  | 529 | pm_runtime_resume() | 
|  | 530 | pm_runtime_get_sync() | 
|  | 531 | pm_runtime_put_sync() | 
|  | 532 | pm_runtime_put_sync_suspend() | 
|  | 533 | pm_runtime_put_sync_autosuspend() | 
|  | 534 |  | 
|  | 535 | 5. Runtime PM Initialization, Device Probing and Removal | 
|  | 536 |  | 
|  | 537 | Initially, the runtime PM is disabled for all devices, which means that the | 
|  | 538 | majority of the runtime PM helper functions described in Section 4 will return | 
|  | 539 | -EAGAIN until pm_runtime_enable() is called for the device. | 
|  | 540 |  | 
|  | 541 | In addition to that, the initial runtime PM status of all devices is | 
|  | 542 | 'suspended', but it need not reflect the actual physical state of the device. | 
|  | 543 | Thus, if the device is initially active (i.e. it is able to process I/O), its | 
|  | 544 | runtime PM status must be changed to 'active', with the help of | 
|  | 545 | pm_runtime_set_active(), before pm_runtime_enable() is called for the device. | 
|  | 546 |  | 
|  | 547 | However, if the device has a parent and the parent's runtime PM is enabled, | 
|  | 548 | calling pm_runtime_set_active() for the device will affect the parent, unless | 
|  | 549 | the parent's 'power.ignore_children' flag is set.  Namely, in that case the | 
|  | 550 | parent won't be able to suspend at run time, using the PM core's helper | 
|  | 551 | functions, as long as the child's status is 'active', even if the child's | 
|  | 552 | runtime PM is still disabled (i.e. pm_runtime_enable() hasn't been called for | 
|  | 553 | the child yet or pm_runtime_disable() has been called for it).  For this reason, | 
|  | 554 | once pm_runtime_set_active() has been called for the device, pm_runtime_enable() | 
|  | 555 | should be called for it too as soon as reasonably possible or its runtime PM | 
|  | 556 | status should be changed back to 'suspended' with the help of | 
|  | 557 | pm_runtime_set_suspended(). | 
|  | 558 |  | 
|  | 559 | If the default initial runtime PM status of the device (i.e. 'suspended') | 
|  | 560 | reflects the actual state of the device, its bus type's or its driver's | 
|  | 561 | ->probe() callback will likely need to wake it up using one of the PM core's | 
|  | 562 | helper functions described in Section 4.  In that case, pm_runtime_resume() | 
|  | 563 | should be used.  Of course, for this purpose the device's runtime PM has to be | 
|  | 564 | enabled earlier by calling pm_runtime_enable(). | 
|  | 565 |  | 
|  | 566 | Note, if the device may execute pm_runtime calls during the probe (such as | 
|  | 567 | if it is registers with a subsystem that may call back in) then the | 
|  | 568 | pm_runtime_get_sync() call paired with a pm_runtime_put() call will be | 
|  | 569 | appropriate to ensure that the device is not put back to sleep during the | 
|  | 570 | probe. This can happen with systems such as the network device layer. | 
|  | 571 |  | 
|  | 572 | It may be desirable to suspend the device once ->probe() has finished. | 
|  | 573 | Therefore the driver core uses the asynchronous pm_request_idle() to submit a | 
|  | 574 | request to execute the subsystem-level idle callback for the device at that | 
|  | 575 | time.  A driver that makes use of the runtime autosuspend feature, may want to | 
|  | 576 | update the last busy mark before returning from ->probe(). | 
|  | 577 |  | 
|  | 578 | Moreover, the driver core prevents runtime PM callbacks from racing with the bus | 
|  | 579 | notifier callback in __device_release_driver(), which is necessary, because the | 
|  | 580 | notifier is used by some subsystems to carry out operations affecting the | 
|  | 581 | runtime PM functionality.  It does so by calling pm_runtime_get_sync() before | 
|  | 582 | driver_sysfs_remove() and the BUS_NOTIFY_UNBIND_DRIVER notifications.  This | 
|  | 583 | resumes the device if it's in the suspended state and prevents it from | 
|  | 584 | being suspended again while those routines are being executed. | 
|  | 585 |  | 
|  | 586 | To allow bus types and drivers to put devices into the suspended state by | 
|  | 587 | calling pm_runtime_suspend() from their ->remove() routines, the driver core | 
|  | 588 | executes pm_runtime_put_sync() after running the BUS_NOTIFY_UNBIND_DRIVER | 
|  | 589 | notifications in __device_release_driver().  This requires bus types and | 
|  | 590 | drivers to make their ->remove() callbacks avoid races with runtime PM directly, | 
|  | 591 | but also it allows of more flexibility in the handling of devices during the | 
|  | 592 | removal of their drivers. | 
|  | 593 |  | 
|  | 594 | Drivers in ->remove() callback should undo the runtime PM changes done | 
|  | 595 | in ->probe(). Usually this means calling pm_runtime_disable(), | 
|  | 596 | pm_runtime_dont_use_autosuspend() etc. | 
|  | 597 |  | 
|  | 598 | The user space can effectively disallow the driver of the device to power manage | 
|  | 599 | it at run time by changing the value of its /sys/devices/.../power/control | 
|  | 600 | attribute to "on", which causes pm_runtime_forbid() to be called.  In principle, | 
|  | 601 | this mechanism may also be used by the driver to effectively turn off the | 
|  | 602 | runtime power management of the device until the user space turns it on. | 
|  | 603 | Namely, during the initialization the driver can make sure that the runtime PM | 
|  | 604 | status of the device is 'active' and call pm_runtime_forbid().  It should be | 
|  | 605 | noted, however, that if the user space has already intentionally changed the | 
|  | 606 | value of /sys/devices/.../power/control to "auto" to allow the driver to power | 
|  | 607 | manage the device at run time, the driver may confuse it by using | 
|  | 608 | pm_runtime_forbid() this way. | 
|  | 609 |  | 
|  | 610 | 6. Runtime PM and System Sleep | 
|  | 611 |  | 
|  | 612 | Runtime PM and system sleep (i.e., system suspend and hibernation, also known | 
|  | 613 | as suspend-to-RAM and suspend-to-disk) interact with each other in a couple of | 
|  | 614 | ways.  If a device is active when a system sleep starts, everything is | 
|  | 615 | straightforward.  But what should happen if the device is already suspended? | 
|  | 616 |  | 
|  | 617 | The device may have different wake-up settings for runtime PM and system sleep. | 
|  | 618 | For example, remote wake-up may be enabled for runtime suspend but disallowed | 
|  | 619 | for system sleep (device_may_wakeup(dev) returns 'false').  When this happens, | 
|  | 620 | the subsystem-level system suspend callback is responsible for changing the | 
|  | 621 | device's wake-up setting (it may leave that to the device driver's system | 
|  | 622 | suspend routine).  It may be necessary to resume the device and suspend it again | 
|  | 623 | in order to do so.  The same is true if the driver uses different power levels | 
|  | 624 | or other settings for runtime suspend and system sleep. | 
|  | 625 |  | 
|  | 626 | During system resume, the simplest approach is to bring all devices back to full | 
|  | 627 | power, even if they had been suspended before the system suspend began.  There | 
|  | 628 | are several reasons for this, including: | 
|  | 629 |  | 
|  | 630 | * The device might need to switch power levels, wake-up settings, etc. | 
|  | 631 |  | 
|  | 632 | * Remote wake-up events might have been lost by the firmware. | 
|  | 633 |  | 
|  | 634 | * The device's children may need the device to be at full power in order | 
|  | 635 | to resume themselves. | 
|  | 636 |  | 
|  | 637 | * The driver's idea of the device state may not agree with the device's | 
|  | 638 | physical state.  This can happen during resume from hibernation. | 
|  | 639 |  | 
|  | 640 | * The device might need to be reset. | 
|  | 641 |  | 
|  | 642 | * Even though the device was suspended, if its usage counter was > 0 then most | 
|  | 643 | likely it would need a runtime resume in the near future anyway. | 
|  | 644 |  | 
|  | 645 | If the device had been suspended before the system suspend began and it's | 
|  | 646 | brought back to full power during resume, then its runtime PM status will have | 
|  | 647 | to be updated to reflect the actual post-system sleep status.  The way to do | 
|  | 648 | this is: | 
|  | 649 |  | 
|  | 650 | pm_runtime_disable(dev); | 
|  | 651 | pm_runtime_set_active(dev); | 
|  | 652 | pm_runtime_enable(dev); | 
|  | 653 |  | 
|  | 654 | The PM core always increments the runtime usage counter before calling the | 
|  | 655 | ->suspend() callback and decrements it after calling the ->resume() callback. | 
|  | 656 | Hence disabling runtime PM temporarily like this will not cause any runtime | 
|  | 657 | suspend attempts to be permanently lost.  If the usage count goes to zero | 
|  | 658 | following the return of the ->resume() callback, the ->runtime_idle() callback | 
|  | 659 | will be invoked as usual. | 
|  | 660 |  | 
|  | 661 | On some systems, however, system sleep is not entered through a global firmware | 
|  | 662 | or hardware operation.  Instead, all hardware components are put into low-power | 
|  | 663 | states directly by the kernel in a coordinated way.  Then, the system sleep | 
|  | 664 | state effectively follows from the states the hardware components end up in | 
|  | 665 | and the system is woken up from that state by a hardware interrupt or a similar | 
|  | 666 | mechanism entirely under the kernel's control.  As a result, the kernel never | 
|  | 667 | gives control away and the states of all devices during resume are precisely | 
|  | 668 | known to it.  If that is the case and none of the situations listed above takes | 
|  | 669 | place (in particular, if the system is not waking up from hibernation), it may | 
|  | 670 | be more efficient to leave the devices that had been suspended before the system | 
|  | 671 | suspend began in the suspended state. | 
|  | 672 |  | 
|  | 673 | To this end, the PM core provides a mechanism allowing some coordination between | 
|  | 674 | different levels of device hierarchy.  Namely, if a system suspend .prepare() | 
|  | 675 | callback returns a positive number for a device, that indicates to the PM core | 
|  | 676 | that the device appears to be runtime-suspended and its state is fine, so it | 
|  | 677 | may be left in runtime suspend provided that all of its descendants are also | 
|  | 678 | left in runtime suspend.  If that happens, the PM core will not execute any | 
|  | 679 | system suspend and resume callbacks for all of those devices, except for the | 
|  | 680 | complete callback, which is then entirely responsible for handling the device | 
|  | 681 | as appropriate.  This only applies to system suspend transitions that are not | 
|  | 682 | related to hibernation (see Documentation/driver-api/pm/devices.rst for more | 
|  | 683 | information). | 
|  | 684 |  | 
|  | 685 | The PM core does its best to reduce the probability of race conditions between | 
|  | 686 | the runtime PM and system suspend/resume (and hibernation) callbacks by carrying | 
|  | 687 | out the following operations: | 
|  | 688 |  | 
|  | 689 | * During system suspend pm_runtime_get_noresume() is called for every device | 
|  | 690 | right before executing the subsystem-level .prepare() callback for it and | 
|  | 691 | pm_runtime_barrier() is called for every device right before executing the | 
|  | 692 | subsystem-level .suspend() callback for it.  In addition to that the PM core | 
|  | 693 | calls  __pm_runtime_disable() with 'false' as the second argument for every | 
|  | 694 | device right before executing the subsystem-level .suspend_late() callback | 
|  | 695 | for it. | 
|  | 696 |  | 
|  | 697 | * During system resume pm_runtime_enable() and pm_runtime_put() are called for | 
|  | 698 | every device right after executing the subsystem-level .resume_early() | 
|  | 699 | callback and right after executing the subsystem-level .complete() callback | 
|  | 700 | for it, respectively. | 
|  | 701 |  | 
|  | 702 | 7. Generic subsystem callbacks | 
|  | 703 |  | 
|  | 704 | Subsystems may wish to conserve code space by using the set of generic power | 
|  | 705 | management callbacks provided by the PM core, defined in | 
|  | 706 | driver/base/power/generic_ops.c: | 
|  | 707 |  | 
|  | 708 | int pm_generic_runtime_suspend(struct device *dev); | 
|  | 709 | - invoke the ->runtime_suspend() callback provided by the driver of this | 
|  | 710 | device and return its result, or return 0 if not defined | 
|  | 711 |  | 
|  | 712 | int pm_generic_runtime_resume(struct device *dev); | 
|  | 713 | - invoke the ->runtime_resume() callback provided by the driver of this | 
|  | 714 | device and return its result, or return 0 if not defined | 
|  | 715 |  | 
|  | 716 | int pm_generic_suspend(struct device *dev); | 
|  | 717 | - if the device has not been suspended at run time, invoke the ->suspend() | 
|  | 718 | callback provided by its driver and return its result, or return 0 if not | 
|  | 719 | defined | 
|  | 720 |  | 
|  | 721 | int pm_generic_suspend_noirq(struct device *dev); | 
|  | 722 | - if pm_runtime_suspended(dev) returns "false", invoke the ->suspend_noirq() | 
|  | 723 | callback provided by the device's driver and return its result, or return | 
|  | 724 | 0 if not defined | 
|  | 725 |  | 
|  | 726 | int pm_generic_resume(struct device *dev); | 
|  | 727 | - invoke the ->resume() callback provided by the driver of this device and, | 
|  | 728 | if successful, change the device's runtime PM status to 'active' | 
|  | 729 |  | 
|  | 730 | int pm_generic_resume_noirq(struct device *dev); | 
|  | 731 | - invoke the ->resume_noirq() callback provided by the driver of this device | 
|  | 732 |  | 
|  | 733 | int pm_generic_freeze(struct device *dev); | 
|  | 734 | - if the device has not been suspended at run time, invoke the ->freeze() | 
|  | 735 | callback provided by its driver and return its result, or return 0 if not | 
|  | 736 | defined | 
|  | 737 |  | 
|  | 738 | int pm_generic_freeze_noirq(struct device *dev); | 
|  | 739 | - if pm_runtime_suspended(dev) returns "false", invoke the ->freeze_noirq() | 
|  | 740 | callback provided by the device's driver and return its result, or return | 
|  | 741 | 0 if not defined | 
|  | 742 |  | 
|  | 743 | int pm_generic_thaw(struct device *dev); | 
|  | 744 | - if the device has not been suspended at run time, invoke the ->thaw() | 
|  | 745 | callback provided by its driver and return its result, or return 0 if not | 
|  | 746 | defined | 
|  | 747 |  | 
|  | 748 | int pm_generic_thaw_noirq(struct device *dev); | 
|  | 749 | - if pm_runtime_suspended(dev) returns "false", invoke the ->thaw_noirq() | 
|  | 750 | callback provided by the device's driver and return its result, or return | 
|  | 751 | 0 if not defined | 
|  | 752 |  | 
|  | 753 | int pm_generic_poweroff(struct device *dev); | 
|  | 754 | - if the device has not been suspended at run time, invoke the ->poweroff() | 
|  | 755 | callback provided by its driver and return its result, or return 0 if not | 
|  | 756 | defined | 
|  | 757 |  | 
|  | 758 | int pm_generic_poweroff_noirq(struct device *dev); | 
|  | 759 | - if pm_runtime_suspended(dev) returns "false", run the ->poweroff_noirq() | 
|  | 760 | callback provided by the device's driver and return its result, or return | 
|  | 761 | 0 if not defined | 
|  | 762 |  | 
|  | 763 | int pm_generic_restore(struct device *dev); | 
|  | 764 | - invoke the ->restore() callback provided by the driver of this device and, | 
|  | 765 | if successful, change the device's runtime PM status to 'active' | 
|  | 766 |  | 
|  | 767 | int pm_generic_restore_noirq(struct device *dev); | 
|  | 768 | - invoke the ->restore_noirq() callback provided by the device's driver | 
|  | 769 |  | 
|  | 770 | These functions are the defaults used by the PM core, if a subsystem doesn't | 
|  | 771 | provide its own callbacks for ->runtime_idle(), ->runtime_suspend(), | 
|  | 772 | ->runtime_resume(), ->suspend(), ->suspend_noirq(), ->resume(), | 
|  | 773 | ->resume_noirq(), ->freeze(), ->freeze_noirq(), ->thaw(), ->thaw_noirq(), | 
|  | 774 | ->poweroff(), ->poweroff_noirq(), ->restore(), ->restore_noirq() in the | 
|  | 775 | subsystem-level dev_pm_ops structure. | 
|  | 776 |  | 
|  | 777 | Device drivers that wish to use the same function as a system suspend, freeze, | 
|  | 778 | poweroff and runtime suspend callback, and similarly for system resume, thaw, | 
|  | 779 | restore, and runtime resume, can achieve this with the help of the | 
|  | 780 | UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its | 
|  | 781 | last argument to NULL). | 
|  | 782 |  | 
|  | 783 | 8. "No-Callback" Devices | 
|  | 784 |  | 
|  | 785 | Some "devices" are only logical sub-devices of their parent and cannot be | 
|  | 786 | power-managed on their own.  (The prototype example is a USB interface.  Entire | 
|  | 787 | USB devices can go into low-power mode or send wake-up requests, but neither is | 
|  | 788 | possible for individual interfaces.)  The drivers for these devices have no | 
|  | 789 | need of runtime PM callbacks; if the callbacks did exist, ->runtime_suspend() | 
|  | 790 | and ->runtime_resume() would always return 0 without doing anything else and | 
|  | 791 | ->runtime_idle() would always call pm_runtime_suspend(). | 
|  | 792 |  | 
|  | 793 | Subsystems can tell the PM core about these devices by calling | 
|  | 794 | pm_runtime_no_callbacks().  This should be done after the device structure is | 
|  | 795 | initialized and before it is registered (although after device registration is | 
|  | 796 | also okay).  The routine will set the device's power.no_callbacks flag and | 
|  | 797 | prevent the non-debugging runtime PM sysfs attributes from being created. | 
|  | 798 |  | 
|  | 799 | When power.no_callbacks is set, the PM core will not invoke the | 
|  | 800 | ->runtime_idle(), ->runtime_suspend(), or ->runtime_resume() callbacks. | 
|  | 801 | Instead it will assume that suspends and resumes always succeed and that idle | 
|  | 802 | devices should be suspended. | 
|  | 803 |  | 
|  | 804 | As a consequence, the PM core will never directly inform the device's subsystem | 
|  | 805 | or driver about runtime power changes.  Instead, the driver for the device's | 
|  | 806 | parent must take responsibility for telling the device's driver when the | 
|  | 807 | parent's power state changes. | 
|  | 808 |  | 
|  | 809 | 9. Autosuspend, or automatically-delayed suspends | 
|  | 810 |  | 
|  | 811 | Changing a device's power state isn't free; it requires both time and energy. | 
|  | 812 | A device should be put in a low-power state only when there's some reason to | 
|  | 813 | think it will remain in that state for a substantial time.  A common heuristic | 
|  | 814 | says that a device which hasn't been used for a while is liable to remain | 
|  | 815 | unused; following this advice, drivers should not allow devices to be suspended | 
|  | 816 | at runtime until they have been inactive for some minimum period.  Even when | 
|  | 817 | the heuristic ends up being non-optimal, it will still prevent devices from | 
|  | 818 | "bouncing" too rapidly between low-power and full-power states. | 
|  | 819 |  | 
|  | 820 | The term "autosuspend" is an historical remnant.  It doesn't mean that the | 
|  | 821 | device is automatically suspended (the subsystem or driver still has to call | 
|  | 822 | the appropriate PM routines); rather it means that runtime suspends will | 
|  | 823 | automatically be delayed until the desired period of inactivity has elapsed. | 
|  | 824 |  | 
|  | 825 | Inactivity is determined based on the power.last_busy field.  Drivers should | 
|  | 826 | call pm_runtime_mark_last_busy() to update this field after carrying out I/O, | 
|  | 827 | typically just before calling pm_runtime_put_autosuspend().  The desired length | 
|  | 828 | of the inactivity period is a matter of policy.  Subsystems can set this length | 
|  | 829 | initially by calling pm_runtime_set_autosuspend_delay(), but after device | 
|  | 830 | registration the length should be controlled by user space, using the | 
|  | 831 | /sys/devices/.../power/autosuspend_delay_ms attribute. | 
|  | 832 |  | 
|  | 833 | In order to use autosuspend, subsystems or drivers must call | 
|  | 834 | pm_runtime_use_autosuspend() (preferably before registering the device), and | 
|  | 835 | thereafter they should use the various *_autosuspend() helper functions instead | 
|  | 836 | of the non-autosuspend counterparts: | 
|  | 837 |  | 
|  | 838 | Instead of: pm_runtime_suspend    use: pm_runtime_autosuspend; | 
|  | 839 | Instead of: pm_schedule_suspend   use: pm_request_autosuspend; | 
|  | 840 | Instead of: pm_runtime_put        use: pm_runtime_put_autosuspend; | 
|  | 841 | Instead of: pm_runtime_put_sync   use: pm_runtime_put_sync_autosuspend. | 
|  | 842 |  | 
|  | 843 | Drivers may also continue to use the non-autosuspend helper functions; they | 
|  | 844 | will behave normally, which means sometimes taking the autosuspend delay into | 
|  | 845 | account (see pm_runtime_idle). | 
|  | 846 |  | 
|  | 847 | Under some circumstances a driver or subsystem may want to prevent a device | 
|  | 848 | from autosuspending immediately, even though the usage counter is zero and the | 
|  | 849 | autosuspend delay time has expired.  If the ->runtime_suspend() callback | 
|  | 850 | returns -EAGAIN or -EBUSY, and if the next autosuspend delay expiration time is | 
|  | 851 | in the future (as it normally would be if the callback invoked | 
|  | 852 | pm_runtime_mark_last_busy()), the PM core will automatically reschedule the | 
|  | 853 | autosuspend.  The ->runtime_suspend() callback can't do this rescheduling | 
|  | 854 | itself because no suspend requests of any kind are accepted while the device is | 
|  | 855 | suspending (i.e., while the callback is running). | 
|  | 856 |  | 
|  | 857 | The implementation is well suited for asynchronous use in interrupt contexts. | 
|  | 858 | However such use inevitably involves races, because the PM core can't | 
|  | 859 | synchronize ->runtime_suspend() callbacks with the arrival of I/O requests. | 
|  | 860 | This synchronization must be handled by the driver, using its private lock. | 
|  | 861 | Here is a schematic pseudo-code example: | 
|  | 862 |  | 
|  | 863 | foo_read_or_write(struct foo_priv *foo, void *data) | 
|  | 864 | { | 
|  | 865 | lock(&foo->private_lock); | 
|  | 866 | add_request_to_io_queue(foo, data); | 
|  | 867 | if (foo->num_pending_requests++ == 0) | 
|  | 868 | pm_runtime_get(&foo->dev); | 
|  | 869 | if (!foo->is_suspended) | 
|  | 870 | foo_process_next_request(foo); | 
|  | 871 | unlock(&foo->private_lock); | 
|  | 872 | } | 
|  | 873 |  | 
|  | 874 | foo_io_completion(struct foo_priv *foo, void *req) | 
|  | 875 | { | 
|  | 876 | lock(&foo->private_lock); | 
|  | 877 | if (--foo->num_pending_requests == 0) { | 
|  | 878 | pm_runtime_mark_last_busy(&foo->dev); | 
|  | 879 | pm_runtime_put_autosuspend(&foo->dev); | 
|  | 880 | } else { | 
|  | 881 | foo_process_next_request(foo); | 
|  | 882 | } | 
|  | 883 | unlock(&foo->private_lock); | 
|  | 884 | /* Send req result back to the user ... */ | 
|  | 885 | } | 
|  | 886 |  | 
|  | 887 | int foo_runtime_suspend(struct device *dev) | 
|  | 888 | { | 
|  | 889 | struct foo_priv foo = container_of(dev, ...); | 
|  | 890 | int ret = 0; | 
|  | 891 |  | 
|  | 892 | lock(&foo->private_lock); | 
|  | 893 | if (foo->num_pending_requests > 0) { | 
|  | 894 | ret = -EBUSY; | 
|  | 895 | } else { | 
|  | 896 | /* ... suspend the device ... */ | 
|  | 897 | foo->is_suspended = 1; | 
|  | 898 | } | 
|  | 899 | unlock(&foo->private_lock); | 
|  | 900 | return ret; | 
|  | 901 | } | 
|  | 902 |  | 
|  | 903 | int foo_runtime_resume(struct device *dev) | 
|  | 904 | { | 
|  | 905 | struct foo_priv foo = container_of(dev, ...); | 
|  | 906 |  | 
|  | 907 | lock(&foo->private_lock); | 
|  | 908 | /* ... resume the device ... */ | 
|  | 909 | foo->is_suspended = 0; | 
|  | 910 | pm_runtime_mark_last_busy(&foo->dev); | 
|  | 911 | if (foo->num_pending_requests > 0) | 
|  | 912 | foo_process_next_request(foo); | 
|  | 913 | unlock(&foo->private_lock); | 
|  | 914 | return 0; | 
|  | 915 | } | 
|  | 916 |  | 
|  | 917 | The important point is that after foo_io_completion() asks for an autosuspend, | 
|  | 918 | the foo_runtime_suspend() callback may race with foo_read_or_write(). | 
|  | 919 | Therefore foo_runtime_suspend() has to check whether there are any pending I/O | 
|  | 920 | requests (while holding the private lock) before allowing the suspend to | 
|  | 921 | proceed. | 
|  | 922 |  | 
|  | 923 | In addition, the power.autosuspend_delay field can be changed by user space at | 
|  | 924 | any time.  If a driver cares about this, it can call | 
|  | 925 | pm_runtime_autosuspend_expiration() from within the ->runtime_suspend() | 
|  | 926 | callback while holding its private lock.  If the function returns a nonzero | 
|  | 927 | value then the delay has not yet expired and the callback should return | 
|  | 928 | -EAGAIN. |