summaryrefslogtreecommitdiff
path: root/drivers/pci
AgeCommit message (Collapse)Author
2020-06-25PCI/PTM: Inherit Switch Downstream Port PTM settings from Upstream PortBjorn Helgaas
[ Upstream commit 7b38fd9760f51cc83d80eed2cfbde8b5ead9e93a ] Except for Endpoints, we enable PTM at enumeration-time. Previously we did not account for the fact that Switch Downstream Ports are not permitted to have a PTM capability; their PTM behavior is controlled by the Upstream Port (PCIe r5.0, sec 7.9.16). Since Downstream Ports don't have a PTM capability, we did not mark them as "ptm_enabled", which meant that pci_enable_ptm() on an Endpoint failed because there was no PTM path to it. Mark Downstream Ports as "ptm_enabled" if their Upstream Port has PTM enabled. Fixes: eec097d43100 ("PCI: Add pci_enable_ptm() for drivers to enable PTM on endpoints") Reported-by: Aditya Paluri <Venkata.AdityaPaluri@synopsys.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-25PCI: Fix pci_register_host_bridge() device_register() error handlingRob Herring
[ Upstream commit 1b54ae8327a4d630111c8d88ba7906483ec6010b ] If device_register() has an error, we should bail out of pci_register_host_bridge() rather than continuing on. Fixes: 37d6a0a6f470 ("PCI: Add pci_register_host_bridge() interface") Link: https://lore.kernel.org/r/20200513223859.11295-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-25PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X BridgesKai-Heng Feng
[ Upstream commit 66ff14e59e8a30690755b08bc3042359703fb07a ] 7d715a6c1ae5 ("PCI: add PCI Express ASPM support") added the ability for Linux to enable ASPM, but for some undocumented reason, it didn't enable ASPM on links where the downstream component is a PCIe-to-PCI/PCI-X Bridge. Remove this exclusion so we can enable ASPM on these links. The Dell OptiPlex 7080 mentioned in the bugzilla has a TI XIO2001 PCIe-to-PCI Bridge. Enabling ASPM on the link leading to it allows the Intel SoC to enter deeper Package C-states, which is a significant power savings. [bhelgaas: commit log] Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207571 Link: https://lore.kernel.org/r/20200505173423.26968-1-kai.heng.feng@canonical.com Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-25PCI: rcar: Fix incorrect programming of OB windowsAndrew Murray
[ Upstream commit 2b9f217433e31d125fb697ca7974d3de3ecc3e92 ] The outbound windows (PCIEPAUR(x), PCIEPALR(x)) describe a mapping between a CPU address (which is determined by the window number 'x') and a programmed PCI address - Thus allowing the controller to translate CPU accesses into PCI accesses. However the existing code incorrectly writes the CPU address - lets fix this by writing the PCI address instead. For memory transactions, existing DT users describe a 1:1 identity mapping and thus this change should have no effect. However the same isn't true for I/O. Link: https://lore.kernel.org/r/20191004132941.6660-1-andrew.murray@arm.com Fixes: c25da4778803 ("PCI: rcar: Add Renesas R-Car PCIe driver") Tested-by: Marek Vasut <marek.vasut+renesas@gmail.com> Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Marek Vasut <marek.vasut+renesas@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-25PCI: aardvark: Don't blindly enable ASPM L0s and don't write to read-only ↵Pali Rohár
register [ Upstream commit 90c6cb4a355e7befcb557d217d1d8b8bd5875a05 ] Trying to change Link Status register does not have any effect as this is a read-only register. Trying to overwrite bits for Negotiated Link Width does not make sense. In future proper change of link width can be done via Lane Count Select bits in PCIe Control 0 register. Trying to unconditionally enable ASPM L0s via ASPM Control bits in Link Control register is wrong. There should be at least some detection if endpoint supports L0s as isn't mandatory. Moreover ASPM Control bits in Link Control register are controlled by pcie/aspm.c code which sets it according to system ASPM settings, immediately after aardvark driver probes. So setting these bits by aardvark driver has no long running effect. Remove code which touches ASPM L0s bits from this driver and let kernel's ASPM implementation to set ASPM state properly. Some users are reporting issues that this code is problematic for some Intel wifi cards and removing it fixes them, see e.g.: https://bugzilla.kernel.org/show_bug.cgi?id=196339 If problems with Intel wifi cards occur even after this commit, then pcie/aspm.c code could be modified / hooked to not enable ASPM L0s state for affected problematic cards. Link: https://lore.kernel.org/r/20200430080625.26070-3-pali@kernel.org Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com> Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-20PCI: Program MPS for RCiEP devicesAshok Raj
commit aa0ce96d72dd2e1b0dfd0fb868f82876e7790878 upstream. Root Complex Integrated Endpoints (RCiEPs) do not have an upstream bridge, so pci_configure_mps() previously ignored them, which may result in reduced performance. Instead, program the Max_Payload_Size of RCiEPs to the maximum supported value (unless it is limited for the PCIE_BUS_PEER2PEER case). This also affects the subsequent programming of Max_Read_Request_Size because Linux programs MRRS based on the MPS value. Fixes: 9dae3a97297f ("PCI: Move MPS configuration check to pci_configure_device()") Link: https://lore.kernel.org/r/1585343775-4019-1-git-send-email-ashok.raj@intel.com Tested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-20PCI: Unify ACS quirk desired vs provided checkingBjorn Helgaas
[ Upstream commit 7cf2cba43f15c74bac46dc5f0326805d25ef514d ] Most of the ACS quirks have a similar pattern of: acs_flags &= ~( <controls provided by this device> ); return acs_flags ? 0 : 1; Pull this out into a helper function to simplify the quirks slightly. The helper function is also a convenient place for comments about what the list of ACS controls means. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-20PCI: Add ACS quirk for Intel Root Complex Integrated EndpointsAshok Raj
[ Upstream commit 3247bd10a4502a3075ce8e1c3c7d31ef76f193ce ] All Intel platforms guarantee that all root complex implementations must send transactions up to IOMMU for address translations. Hence for Intel RCiEP devices, we can assume some ACS-type isolation even without an ACS capability. From the Intel VT-d spec, r3.1, sec 3.16 ("Root-Complex Peer to Peer Considerations"): When DMA remapping is enabled, peer-to-peer requests through the Root-Complex must be handled as follows: - The input address in the request is translated (through first-level, second-level or nested translation) to a host physical address (HPA). The address decoding for peer addresses must be done only on the translated HPA. Hardware implementations are free to further limit peer-to-peer accesses to specific host physical address regions (or to completely disallow peer-forwarding of translated requests). - Since address translation changes the contents (address field) of the PCI Express Transaction Layer Packet (TLP), for PCI Express peer-to-peer requests with ECRC, the Root-Complex hardware must use the new ECRC (re-computed with the translated address) if it decides to forward the TLP as a peer request. - Root-ports, and multi-function root-complex integrated endpoints, may support additional peer-to-peer control features by supporting PCI Express Access Control Services (ACS) capability. Refer to ACS capability in PCI Express specifications for details. Since Linux didn't give special treatment to allow this exception, certain RCiEP MFD devices were grouped in a single IOMMU group. This doesn't permit a single device to be assigned to a guest for instance. In one vendor system: Device 14.x were grouped in a single IOMMU group. /sys/kernel/iommu_groups/5/devices/0000:00:14.0 /sys/kernel/iommu_groups/5/devices/0000:00:14.2 /sys/kernel/iommu_groups/5/devices/0000:00:14.3 After this patch: /sys/kernel/iommu_groups/5/devices/0000:00:14.0 /sys/kernel/iommu_groups/5/devices/0000:00:14.2 /sys/kernel/iommu_groups/6/devices/0000:00:14.3 <<< new group 14.0 and 14.2 are integrated devices, but legacy end points, whereas 14.3 was a PCIe-compliant RCiEP. 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30) Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 This permits assigning this device to a guest VM. [bhelgaas: drop "Fixes" tag since this doesn't fix a bug in that commit] Link: https://lore.kernel.org/r/1590699462-7131-1-git-send-email-ashok.raj@intel.com Tested-by: Darrel Goeddel <dgoeddel@forcepoint.com> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Cc: stable@vger.kernel.org Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Mark Scott <mscott@forcepoint.com>, Cc: Romil Sharma <rsharma@forcepoint.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-20PCI: Generalize multi-function power dependency device linksAbhishek Sahu
[ Upstream commit a17beb1a0882a544523dcb5d0da4801272dfd43a ] Although not allowed by the PCI specs, some multi-function devices have power dependencies between the functions. For example, function 1 may not work unless function 0 is in the D0 power state. The existing quirk_gpu_hda() adds a device link to express this dependency for GPU and HDA devices, but it really is not specific to those device types. Generalize it and rename it to pci_create_device_link() so we can create dependencies between any "consumer" and "producer" functions of a multi-function device, where the consumer is only functional if the producer is in D0. This reorganization should not affect any functionality. Link: https://lore.kernel.org/lkml/20190606092225.17960-2-abhsahu@nvidia.com Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> [bhelgaas: commit log, reword diagnostic] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-20vga_switcheroo: Use device link for HDA controllerLukas Wunner
[ Upstream commit 07f4f97d7b4bf325d9f558c5b58230387e4e57e0 ] Back in 2013, runtime PM for GPUs with integrated HDA controller was introduced with commits 0d69704ae348 ("gpu/vga_switcheroo: add driver control power feature. (v3)") and 246efa4a072f ("snd/hda: add runtime suspend/resume on optimus support (v4)"). Briefly, the idea was that the HDA controller is forced on and off in unison with the GPU. The original code is mostly still in place even though it was never a 100% perfect solution: E.g. on access to the HDA controller, the GPU is powered up via vga_switcheroo_runtime_resume_hdmi_audio() but there are no provisions to keep it resumed until access to the HDA controller has ceased: The GPU autosuspends after 5 seconds, rendering the HDA controller inaccessible. Additionally, a kludge is required when hda_intel.c probes: It has to check whether the GPU is powered down (check_hdmi_disabled()) and defer probing if so. However in the meantime (in v4.10) the driver core has gained a feature called device links which promises to solve such issues in a clean way: It allows us to declare a dependency from the HDA controller (consumer) to the GPU (supplier). The PM core then automagically ensures that the GPU is runtime resumed as long as the HDA controller's ->probe hook is executed and whenever the HDA controller is accessed. By default, the HDA controller has a dependency on its parent, a PCIe Root Port. Adding a device link creates another dependency on its sibling: PCIe Root Port ^ ^ | | | | HDA ===> GPU The device link is not only used for runtime PM, it also guarantees that on system sleep, the HDA controller suspends before the GPU and resumes after the GPU, and on system shutdown the HDA controller's ->shutdown hook is executed before the one of the GPU. It is a complete solution. Using this functionality is as simple as calling device_link_add(), which results in a dmesg entry like this: pci 0000:01:00.1: Linked as a consumer to 0000:01:00.0 The code for the GPU-governed audio power management can thus be removed (except where it's still needed for legacy manual power control). The device link is added in a PCI quirk rather than in hda_intel.c. It is therefore legal for the GPU to runtime suspend to D3cold even if the HDA controller is not bound to a driver or if CONFIG_SND_HDA_INTEL is not enabled, for accesses to the HDA controller will cause the GPU to wake up regardless if they're occurring outside of hda_intel.c (think config space readout via sysfs). Contrary to the previous implementation, the HDA controller's power state is now self-governed, rather than GPU-governed, whereas the GPU's power state is no longer fully self-governed. (The HDA controller needs to runtime suspend before the GPU can.) It is thus crucial that runtime PM is always activated on the HDA controller even if CONFIG_SND_HDA_POWER_SAVE_DEFAULT is set to 0 (which is the default), lest the GPU stays awake. This is achieved by setting the auto_runtime_pm flag on every codec and the AZX_DCAPS_PM_RUNTIME flag on the HDA controller. A side effect is that power consumption might be reduced if the GPU is in use but the HDA controller is not, because the HDA controller is now allowed to go to D3hot. Before, it was forced to stay in D0 as long as the GPU was in use. (There is no reduction in power consumption on my Nvidia GK107, but there might be on other chips.) The code paths for legacy manual power control are adjusted such that runtime PM is disabled during power off, thereby preventing the PM core from resuming the HDA controller. Note that the device link is not only added on vga_switcheroo capable systems, but for *any* GPU with integrated HDA controller. The idea is that the HDA controller streams audio via connectors located on the GPU, so the GPU needs to be on for the HDA controller to do anything useful. This commit implicitly fixes an unbalanced runtime PM ref upon unbind of hda_intel.c: On ->probe, a runtime PM ref was previously released under the condition "azx_has_pm_runtime(chip) || hda->use_vga_switcheroo", but on ->remove a runtime PM ref was only acquired under the first of those conditions. Thus, binding and unbinding the driver twice on a vga_switcheroo capable system caused the runtime PM refcount to drop below zero. The issue is resolved because the AZX_DCAPS_PM_RUNTIME flag is now always set if use_vga_switcheroo is true. For more information on device links please refer to: https://www.kernel.org/doc/html/latest/driver-api/device_link.html Documentation/driver-api/device_link.rst Cc: Dave Airlie <airlied@redhat.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Peter Wu <peter@lekensteyn.nl> Tested-by: Kai Heng Feng <kai.heng.feng@canonical.com> # AMD PowerXpress Tested-by: Mike Lothian <mike@fireburn.co.uk> # AMD PowerXpress Tested-by: Denis Lisov <dennis.lissov@gmail.com> # Nvidia Optimus Tested-by: Peter Wu <peter@lekensteyn.nl> # Nvidia Optimus Tested-by: Lukas Wunner <lukas@wunner.de> # MacBook Pro Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: https://patchwork.freedesktop.org/patch/msgid/51bd38360ff502a8c42b1ebf4405ee1d3f27118d.1520068884.git.lukas@wunner.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-20PCI: Make ACS quirk implementations more uniformBjorn Helgaas
[ Upstream commit c8de8ed2dcaac82e5d76d467dc0b02e0ee79809b ] The ACS quirks differ in needless ways, which makes them look more different than they really are. Reorder the ACS flags in order of definitions in the spec: PCI_ACS_SV Source Validation PCI_ACS_TB Translation Blocking PCI_ACS_RR P2P Request Redirect PCI_ACS_CR P2P Completion Redirect PCI_ACS_UF Upstream Forwarding PCI_ACS_EC P2P Egress Control PCI_ACS_DT Direct Translated P2P (PCIe r5.0, sec 7.7.8.2) and use similar code structure in all. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-20PCI: Add ACS quirk for Ampere root portsFeng Kan
[ Upstream commit 4ef76ad0462cf25ce948541c8724eaa8a8365e1d ] The Ampere Computing PCIe root port does not support ACS at this point. However, the hardware provides isolation and source validation through the SMMU. The stream ID generated by the PCIe ports contain both the bus/device/function number as well as the port ID in its 3 most significant bits. Turn on ACS but disable all the peer-to-peer features. APM is being rebranded to Ampere. The Vendor and Device IDs change, but the functionality stays the same. Signed-off-by: Feng Kan <fkan@apm.com> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-20PCI: Add ACS quirk for iProc PAXBAbhinav Ratna
[ Upstream commit 46b2c32df7a462d0e64b68c513e5c4c1b2a399a7 ] iProc PAXB Root Ports don't advertise an ACS capability, but they do not allow peer-to-peer transactions between Root Ports. Add an ACS quirk so each Root Port can be in a separate IOMMU group. [bhelgaas: commit log, comment, use common implementation style] Link: https://lore.kernel.org/r/1566275985-25670-1-git-send-email-srinath.mannam@broadcom.com Signed-off-by: Abhinav Ratna <abhinav.ratna@broadcom.com> Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-20PCI: Avoid FLR for AMD Starship USB 3.0Kevin Buettner
[ Upstream commit 5727043c73fdfe04597971b5f3f4850d879c1f4f ] The AMD Starship USB 3.0 host controller advertises Function Level Reset support, but it apparently doesn't work. Add a quirk to prevent use of FLR on this device. Without this quirk, when attempting to assign (pass through) an AMD Starship USB 3.0 host controller to a guest OS, the system becomes increasingly unresponsive over the course of several minutes, eventually requiring a hard reset. Shortly after attempting to start the guest, I see these messages: vfio-pci 0000:05:00.3: not ready 1023ms after FLR; waiting vfio-pci 0000:05:00.3: not ready 2047ms after FLR; waiting vfio-pci 0000:05:00.3: not ready 4095ms after FLR; waiting vfio-pci 0000:05:00.3: not ready 8191ms after FLR; waiting And then eventually: vfio-pci 0000:05:00.3: not ready 65535ms after FLR; giving up INFO: NMI handler (perf_event_nmi_handler) took too long to run: 0.000 msecs perf: interrupt took too long (642744 > 2500), lowering kernel.perf_event_max_sample_rate to 1000 INFO: NMI handler (perf_event_nmi_handler) took too long to run: 82.270 msecs INFO: NMI handler (perf_event_nmi_handler) took too long to run: 680.608 msecs INFO: NMI handler (perf_event_nmi_handler) took too long to run: 100.952 msecs ... watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [qemu-system-x86:7487] Tested on a Micro-Star International Co., Ltd. MS-7C59/Creator TRX40 motherboard with an AMD Ryzen Threadripper 3970X. Link: https://lore.kernel.org/r/20200524003529.598434ff@f31-4.lan Signed-off-by: Kevin Buettner <kevinb@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-20PCI: Avoid FLR for AMD Matisse HD Audio & USB 3.0Marcos Scriven
[ Upstream commit 0d14f06cd6657ba3446a5eb780672da487b068e7 ] The AMD Matisse HD Audio & USB 3.0 devices advertise Function Level Reset support, but hang when an FLR is triggered. To reproduce the problem, attach the device to a VM, then detach and try to attach again. Rename the existing quirk_intel_no_flr(), which was not Intel-specific, to quirk_no_flr(), and apply it to prevent the use of FLR on these AMD devices. Link: https://lore.kernel.org/r/CAAri2DpkcuQZYbT6XsALhx2e6vRqPHwtbjHYeiH7MNp4zmt1RA@mail.gmail.com Signed-off-by: Marcos Scriven <marcos@scriven.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-20PCI: Disable MSI for Freescale Layerscape PCIe RC modeHou Zhiqiang
[ Upstream commit 06dc4ee54e306eff61cbdac3593b42b09f618103 ] The Freescale PCIe controller advertises the MSI/MSI-X capability in both RC and Endpoint mode, but in RC mode it doesn't support MSI/MSI-X by itself; it can only transfer MSI/MSI-X from downstream devices. Add a quirk to prevent use of MSI/MSI-X in RC mode. Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Minghuan Lian <minghuan.Lian@nxp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-20PCI: Don't disable decoding when mmio_always_on is setJiaxun Yang
[ Upstream commit b6caa1d8c80cb71b6162cb1f1ec13aa655026c9f ] Don't disable MEM/IO decoding when a device have both non_compliant_bars and mmio_always_on. That would allow us quirk devices with junk in BARs but can't disable their decoding. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Acked-by: Bjorn Helgaas <helgaas@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-02PCI/ASPM: Allow re-enabling Clock PMHeiner Kallweit
[ Upstream commit 35efea32b26f9aacc99bf07e0d2cdfba2028b099 ] Previously Clock PM could not be re-enabled after being disabled by pci_disable_link_state() because clkpm_capable was reset. Change this by adding a clkpm_disable field similar to aspm_disable. Link: https://lore.kernel.org/r/4e8a66db-7d53-4a66-c26c-f0037ffaa705@gmail.com Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-24PCI: endpoint: Fix for concurrent memory allocation in OB address regionKishon Vijay Abraham I
commit 04e046ca57ebed3943422dee10eec9e73aec081e upstream. pci-epc-mem uses a bitmap to manage the Endpoint outbound (OB) address region. This address region will be shared by multiple endpoint functions (in the case of multi function endpoint) and it has to be protected from concurrent access to avoid updating an inconsistent state. Use a mutex to protect bitmap updates to prevent the memory allocation API from returning incorrect addresses. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24PCI/ASPM: Clear the correct bits when enabling L1 substatesYicong Yang
commit 58a3862a10a317a81097ab0c78aecebabb1704f5 upstream. In pcie_config_aspm_l1ss(), we cleared the wrong bits when enabling ASPM L1 Substates. Instead of the L1.x enable bits (PCI_L1SS_CTL1_L1SS_MASK, 0xf), we cleared the Link Activation Interrupt Enable bit (PCI_L1SS_CAP_L1_PM_SS, 0x10). Clear the L1.x enable bits before writing the new L1.x configuration. [bhelgaas: changelog] Fixes: aeda9adebab8 ("PCI/ASPM: Configure L1 substate settings") Link: https://lore.kernel.org/r/1584093227-1292-1-git-send-email-yangyicong@hisilicon.com Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v4.11+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24PCI/switchtec: Fix init_completion race condition with poll_wait()Logan Gunthorpe
[ Upstream commit efbdc769601f4d50018bf7ca50fc9f7c67392ece ] The call to init_completion() in mrpc_queue_cmd() can theoretically race with the call to poll_wait() in switchtec_dev_poll(). poll() write() switchtec_dev_poll() switchtec_dev_write() poll_wait(&s->comp.wait); mrpc_queue_cmd() init_completion(&s->comp) init_waitqueue_head(&s->comp.wait) To my knowledge, no one has hit this bug. Fix this by using reinit_completion() instead of init_completion() in mrpc_queue_cmd(). Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver") Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lkml.kernel.org/r/20200313183608.2646-1-logang@deltatee.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-28PCI/IOV: Fix memory leak in pci_iov_add_virtfn()Navid Emamdoost
[ Upstream commit 8c386cc817878588195dde38e919aa6ba9409d58 ] In the implementation of pci_iov_add_virtfn() the allocated virtfn is leaked if pci_setup_device() fails. The error handling is not calling pci_stop_and_remove_bus_device(). Change the goto label to failed2. Fixes: 156c55325d30 ("PCI: Check for pci_setup_device() failure in pci_iov_add_virtfn()") Link: https://lore.kernel.org/r/20191125195255.23740-1-navid.emamdoost@gmail.com Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-14PCI: Don't disable bridge BARs when assigning bus resourcesLogan Gunthorpe
commit 9db8dc6d0785225c42a37be7b44d1b07b31b8957 upstream. Some PCI bridges implement BARs in addition to bridge windows. For example, here's a PLX switch: 04:00.0 PCI bridge: PLX Technology, Inc. PEX 8724 24-Lane, 6-Port PCI Express Gen 3 (8 GT/s) Switch, 19 x 19mm FCBGA (rev ca) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0 Memory at 90a00000 (32-bit, non-prefetchable) [size=256K] Bus: primary=04, secondary=05, subordinate=0a, sec-latency=0 I/O behind bridge: 00002000-00003fff Memory behind bridge: 90000000-909fffff Prefetchable memory behind bridge: 0000380000800000-0000380000bfffff Previously, when the kernel assigned resource addresses (with the pci=realloc command line parameter, for example) it could clear the struct resource corresponding to the BAR. When this happened, lspci would report this BAR as "ignored": Region 0: Memory at <ignored> (32-bit, non-prefetchable) [size=256K] This is because the kernel reports a zero start address and zero flags in the corresponding sysfs resource file and in /proc/bus/pci/devices. Investigation with 'lspci -x', however, shows the BIOS-assigned address will still be programmed in the device's BAR registers. It's clearly a bug that the kernel lost track of the BAR value, but in most cases, this still won't result in a visible issue because nothing uses the memory, so nothing is affected. However, when an IOMMU is in use, it will not reserve this space in the IOVA because the kernel no longer thinks the range is valid. (See dmar_init_reserved_ranges() for the Intel implementation of this.) Without the proper reserved range, a DMA mapping may allocate an IOVA that matches a bridge BAR, which results in DMA accesses going to the BAR instead of the intended RAM. The problem was in pci_assign_unassigned_root_bus_resources(). When any resource from a bridge device fails to get assigned, the code set the resource's flags to zero. This makes sense for bridge windows, as they will be re-enabled later, but for regular BARs, it makes the kernel permanently lose track of the fact that they decode address space. Change pci_assign_unassigned_root_bus_resources() and pci_assign_unassigned_bridge_resources() so they only clear "res->flags" for bridge *windows*, not bridge BARs. Fixes: da7822e5ad71 ("PCI: update bridge resources to get more big ranges when allocating space (again)") Link: https://lore.kernel.org/r/20200108213208.4612-1-logang@deltatee.com [bhelgaas: commit log, check for pci_is_bridge()] Reported-by: Kit Chow <kchow@gigaio.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-14PCI/switchtec: Fix vep_vector_number ioread widthLogan Gunthorpe
commit 9375646b4cf03aee81bc6c305aa18cc80b682796 upstream. vep_vector_number is actually a 16 bit register which should be read with ioread16() instead of ioread32(). Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver") Link: https://lore.kernel.org/r/20200106190337.2428-3-logang@deltatee.com Reported-by: Doug Meyer <dmeyer@gigaio.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-14PCI: keystone: Fix link training retries initiationYurii Monakov
[ Upstream commit 6df19872d881641e6394f93ef2938cffcbdae5bb ] ks_pcie_stop_link() function does not clear LTSSM_EN_VAL bit so link training was not triggered more than once after startup. In configurations where link can be unstable during early boot, for example, under low temperature, it will never be established. Fixes: 0c4ffcfe1fbc ("PCI: keystone: Add TI Keystone PCIe driver") Signed-off-by: Yurii Monakov <monakov.y@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Andrew Murray <andrew.murray@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-05PCI: Add DMA alias quirk for Intel VCA NTBSlawomir Pawlowski
[ Upstream commit 56b4cd4b7da9ee95778eb5c8abea49f641ebfd91 ] Intel Visual Compute Accelerator (VCA) is a family of PCIe add-in devices exposing computational units via Non Transparent Bridges (NTB, PEX 87xx). Similarly to MIC x200, we need to add DMA aliases to allow buffer access when IOMMU is enabled. Add aliases to allow computational unit access to host memory. These aliases mark the whole VCA device as one IOMMU group. All possible slot numbers (0x20) are used, since we are unable to tell what slot is used on other side. This quirk is intended for both host and computational unit sides. The VCA devices have up to five functions: four for DMA channels and one additional. Link: https://lore.kernel.org/r/5683A335CC8BE1438C3C30C49DCC38DF637CED8E@IRSMSX102.ger.corp.intel.com Signed-off-by: Slawomir Pawlowski <slawomir.pawlowski@intel.com> Signed-off-by: Przemek Kitszel <przemyslawx.kitszel@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27PCI: endpoint: functions: Use memcpy_fromio()/memcpy_toio()Wen Yang
[ Upstream commit 726dabfde6aa35a4f1508e235ae37edbbf9fbc65 ] Functions copying from/to IO addresses should use the memcpy_fromio()/memcpy_toio() API rather than plain memcpy(). Fix the issue detected through the sparse tool. Fixes: 349e7a85b25f ("PCI: endpoint: functions: Add an EP function to test PCI") Suggested-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> [lorenzo.pieralisi@arm.com: updated log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Kishon Vijay Abraham I <kishon@ti.com> CC: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> CC: Bjorn Helgaas <bhelgaas@google.com> CC: Gustavo Pimentel <gustavo.pimentel@synopsys.com> CC: Niklas Cassel <niklas.cassel@axis.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Cyrille Pitchen <cyrille.pitchen@free-electrons.com> CC: linux-pci@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27switchtec: Remove immediate status check after submitting MRPC commandKelvin Cao
[ Upstream commit 526180408b815aa7b96fd48bd23cdd33ef04e38e ] After submitting a Firmware Download MRPC command, Switchtec firmware will delay Management EP BAR MemRd TLP responses by more than 10ms. This is a firmware limitation. Delayed MemRd completions are a problem for systems with a low Completion Timeout (CTO). The current driver checks the MRPC status immediately after submitting an MRPC command, which results in a delayed MemRd completion that may cause a Completion Timeout. Remove the immediate status check and rely on the check after receiving an interrupt or timing out. This is only a software workaround to the READ issue and a proper fix of this should be done in firmware. Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver") Signed-off-by: Kelvin Cao <kelvin.cao@microchip.com> Signed-off-by: Wesley Sheng <wesley.sheng@microchip.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27PCI: iproc: Remove PAXC slot check to allow VF supportJitendra Bhivare
[ Upstream commit 4da6b4480766e5bc9c4d7bc14bf1d0939a1a5fa7 ] Fix previous incorrect logic that limits PAXC slot number to zero only. In order for SRIOV/VF to work, we need to allow the slot number to be greater than zero. Fixes: 46560388c476c ("PCI: iproc: Allow multiple devices except on PAXC") Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-17PCI/PTM: Remove spurious "d" from granularity messageBjorn Helgaas
commit 127a7709495db52a41012deaebbb7afc231dad91 upstream. The granularity message has an extra "d": pci 0000:02:00.0: PTM enabled, 4dns granularity Remove the "d" so the message is simply "PTM enabled, 4ns granularity". Fixes: 8b2ec318eece ("PCI: Add PTM clock granularity information") Link: https://lore.kernel.org/r/20191106222420.10216-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Cc: Jonathan Yong <jonathan.yong@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12PCI/switchtec: Read all 64 bits of part_event_bitmapLogan Gunthorpe
commit 6acdf7e19b37cb3a9258603d0eab315079c19c5e upstream. The part_event_bitmap register is 64 bits wide, so read it with ioread64() instead of the 32-bit ioread32(). Fixes: 52eabba5bcdb ("switchtec: Add IOCTLs to the Switchtec driver") Link: https://lore.kernel.org/r/20190910195833.3891-1-logang@deltatee.com Reported-by: Doug Meyer <dmeyer@gigaio.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v4.12+ Cc: Kelvin Cao <Kelvin.Cao@microchip.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-21PCI: Apply Cavium ACS quirk to ThunderX2 and ThunderX3George Cherian
commit f338bb9f0179cb959977b74e8331b312264d720b upstream. Enhance the ACS quirk for Cavium Processors. Add the root port vendor IDs for ThunderX2 and ThunderX3 series of processors. [bhelgaas: add Fixes: and stable tag] Fixes: f2ddaf8dfd4a ("PCI: Apply Cavium ThunderX ACS quirk to more Root Ports") Link: https://lore.kernel.org/r/20191111024243.GA11408@dc5-eodlnx05.marvell.com Signed-off-by: George Cherian <george.cherian@marvell.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Robert Richter <rrichter@marvell.com> Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-21PCI/MSI: Fix incorrect MSI-X masking on resumeJian-Hong Pan
commit e045fa29e89383c717e308609edd19d2fd29e1be upstream. When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector Control register for each vector and saves it in desc->masked. Each register is 32 bits and bit 0 is the actual Mask bit. When we restored these registers during resume, we previously set the Mask bit if *any* bit in desc->masked was set instead of when the Mask bit itself was set: pci_restore_state pci_restore_msi_state __pci_restore_msix_state for_each_pci_msi_entry msix_mask_irq(entry, entry->masked) <-- entire u32 word __pci_msix_desc_mask_irq(desc, flag) mask_bits = desc->masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT if (flag) <-- testing entire u32, not just bit 0 mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL) This means that after resume, MSI-X vectors were masked when they shouldn't be, which leads to timeouts like this: nvme nvme0: I/O 978 QID 3 timeout, completion polled On resume, set the Mask bit only when the saved Mask bit from suspend was set. This should remove the need for 19ea025e1d28 ("nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"). [bhelgaas: commit log, move fix to __pci_msix_desc_mask_irq()] Link: https://bugzilla.kernel.org/show_bug.cgi?id=204887 Link: https://lore.kernel.org/r/20191008034238.2503-1-jian-hong@endlessm.com Fixes: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-21PCI: Fix Intel ACS quirk UPDCR register addressSteffen Liebergeld
commit d8558ac8c93d429d65d7490b512a3a67e559d0d4 upstream. According to documentation [0] the correct offset for the Upstream Peer Decode Configuration Register (UPDCR) is 0x1014. It was previously defined as 0x1114. d99321b63b1f ("PCI: Enable quirks for PCIe ACS on Intel PCH root ports") intended to enforce isolation between PCI devices allowing them to be put into separate IOMMU groups. Due to the wrong register offset the intended isolation was not fully enforced. This is fixed with this patch. Please note that I did not test this patch because I have no hardware that implements this register. [0] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/4th-gen-core-family-mobile-i-o-datasheet.pdf (page 325) Fixes: d99321b63b1f ("PCI: Enable quirks for PCIe ACS on Intel PCH root ports") Link: https://lore.kernel.org/r/7a3505df-79ba-8a28-464c-88b83eefffa6@kernkonzept.com Signed-off-by: Steffen Liebergeld <steffen.liebergeld@kernkonzept.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Acked-by: Ashok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-21PCI/PM: Always return devices to D0 when thawingDexuan Cui
commit f2c33ccacb2d4bbeae2a255a7ca0cbfd03017b7c upstream. pci_pm_thaw_noirq() is supposed to return the device to D0 and restore its configuration registers, but previously it only did that for devices whose drivers implemented the new power management ops. Hibernation, e.g., via "echo disk > /sys/power/state", involves freezing devices, creating a hibernation image, thawing devices, writing the image, and powering off. The fact that thawing did not return devices with legacy power management to D0 caused errors, e.g., in this path: pci_pm_thaw_noirq if (pci_has_legacy_pm_support(pci_dev)) # true for Mellanox VF driver return pci_legacy_resume_early(dev) # ... legacy PM skips the rest pci_set_power_state(pci_dev, PCI_D0) pci_restore_state(pci_dev) pci_pm_thaw if (pci_has_legacy_pm_support(pci_dev)) pci_legacy_resume drv->resume mlx4_resume ... pci_enable_msix_range ... if (dev->current_state != PCI_D0) # <--- return -EINVAL; which caused these warnings: mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95 PM: Device a6d1:00:02.0 failed to thaw: error -95 Return devices to D0 and restore config registers for all devices, not just those whose drivers support new power management. [bhelgaas: also call pci_restore_state() before pci_legacy_resume_early(), update comment, add stable tag, commit log] Link: https://lore.kernel.org/r/KU1P153MB016637CAEAD346F0AA8E3801BFAD0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity()Ming Lei
[ Upstream commit 77f88abd4a6f73a1a68dbdc0e3f21575fd508fc3 ] The API of pci_alloc_irq_vectors_affinity() says it returns -ENOSPC if fewer than @min_vecs interrupt vectors are available for @dev. However, if a device supports MSI-X but not MSI and a caller requests @min_vecs that can't be satisfied by MSI-X, we previously returned -EINVAL (from the failed attempt to enable MSI), not -ENOSPC. When -ENOSPC is returned, callers may reduce the number IRQs they request and try again. Most callers can use the @min_vecs and @max_vecs parameters to avoid this retry loop, but that doesn't work when using IRQ affinity "nr_sets" because rebalancing the sets is driver-specific. This return value bug has been present since pci_alloc_irq_vectors() was added in v4.10 by aff171641d18 ("PCI: Provide sensible IRQ vector alloc/free routines"), but it wasn't an issue because @min_vecs/@max_vecs removed the need for callers to iteratively reduce the number of IRQs requested and retry the allocation, so they didn't need to distinguish -ENOSPC from -EINVAL. In v5.0, 6da4b3ab9a6e ("genirq/affinity: Add support for allocating interrupt sets") added IRQ sets to the interface, which reintroduced the need to check for -ENOSPC and possibly reduce the number of IRQs requested and retry the allocation. Signed-off-by: Ming Lei <ming.lei@redhat.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Jens Axboe <axboe@fb.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01PCI: keystone: Use quirk to limit MRRS for K2GKishon Vijay Abraham I
[ Upstream commit 148e340c0696369fadbbddc8f4bef801ed247d71 ] PCI controller in K2G also has a limitation that memory read request size (MRRS) must not exceed 256 bytes. Use the quirk to limit MRRS (added for K2HK, K2L and K2E) for K2G as well. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01PCI: vmd: Detach resources after stopping root busJon Derrick
[ Upstream commit dc8af3a827df6d4bb925d3b81b7ec94a7cce9482 ] The VMD removal path calls pci_stop_root_busi(), which tears down the pcie tree, including detaching all of the attached drivers. During driver detachment, devices may use pci_release_region() to release resources. This path relies on the resource being accessible in resource tree. By detaching the child domain from the parent resource domain prior to stopping the bus, we are preventing the list traversal from finding the resource to be freed. If we instead detach the resource after stopping the bus, we will have properly freed the resource and detaching is simply accounting at that point. Without this order, the resource is never freed and is orphaned on VMD removal, leading to a warning: [ 181.940162] Trying to free nonexistent resource <e5a10000-e5a13fff> Fixes: 2c2c5c5cd213 ("x86/PCI: VMD: Attach VMD resources to parent domain's resource tree") Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> [lorenzo.pieralisi@arm.com: updated commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-12PCI: tegra: Enable Relaxed Ordering only for Tegra20 & Tegra30Vidya Sagar
commit 7be142caabc4780b13a522c485abc806de5c4114 upstream. The PCI Tegra controller conversion to a device tree configurable driver in commit d1523b52bff3 ("PCI: tegra: Move PCIe driver to drivers/pci/host") implied that code for the driver can be compiled in for a kernel supporting multiple platforms. Unfortunately, a blind move of the code did not check that some of the quirks that were applied in arch/arm (eg enabling Relaxed Ordering on all PCI devices - since the quirk hook erroneously matches PCI_ANY_ID for both Vendor-ID and Device-ID) are now applied in all kernels that compile the PCI Tegra controlled driver, DT and ACPI alike. This is completely wrong, in that enablement of Relaxed Ordering is only required by default in Tegra20 platforms as described in the Tegra20 Technical Reference Manual (available at https://developer.nvidia.com/embedded/downloads#?search=tegra%202 in Section 34.1, where it is mentioned that Relaxed Ordering bit needs to be enabled in its root ports to avoid deadlock in hardware) and in the Tegra30 platforms for the same reasons (unfortunately not documented in the TRM). There is no other strict requirement on PCI devices Relaxed Ordering enablement on any other Tegra platforms or PCI host bridge driver. Fix this quite upsetting situation by limiting the vendor and device IDs to which the Relaxed Ordering quirk applies to the root ports in question, reported above. Signed-off-by: Vidya Sagar <vidyas@nvidia.com> [lorenzo.pieralisi@arm.com: completely rewrote the commit log/fixes tag] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12PCI: dra7xx: Add shutdown handler to cleanly turn off clocksKeerthy
commit 9c049bea083fea21373b8baf51fe49acbe24e105 upstream Add shutdown handler to cleanly turn off clocks. This will help in cases of kexec where in a new kernel can boot abruptly. Signed-off-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-06PCI/PME: Fix possible use-after-free on removeSven Van Asbroeck
[ Upstream commit 7cf58b79b3072029af127ae865ffc6f00f34b1f8 ] In remove(), ensure that the PME work cannot run after kfree() is called. Otherwise, this could result in a use-after-free. This issue was detected with the help of Coccinelle. Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Sinan Kaya <okaya@kernel.org> Cc: Frederick Lawler <fred@fredlawl.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-29PCI: PM: Fix pci_power_up()Rafael J. Wysocki
commit 45144d42f299455911cc29366656c7324a3a7c97 upstream. There is an arbitrary difference between the system resume and runtime resume code paths for PCI devices regarding the delay to apply when switching the devices from D3cold to D0. Namely, pci_restore_standard_config() used in the runtime resume code path calls pci_set_power_state() which in turn invokes __pci_start_power_transition() to power up the device through the platform firmware and that function applies the transition delay (as per PCI Express Base Specification Revision 2.0, Section 6.6.1). However, pci_pm_default_resume_early() used in the system resume code path calls pci_power_up() which doesn't apply the delay at all and that causes issues to occur during resume from suspend-to-idle on some systems where the delay is required. Since there is no reason for that difference to exist, modify pci_power_up() to follow pci_set_power_state() more closely and invoke __pci_start_power_transition() from there to call the platform firmware to power up the device (in case that's necessary). Fixes: db288c9c5f9d ("PCI / PM: restore the original behavior of pci_set_power_state()") Reported-by: Daniel Drake <drake@endlessm.com> Tested-by: Daniel Drake <drake@endlessm.com> Link: https://lore.kernel.org/linux-pm/CAD8Lp44TYxrMgPLkHCqF9hv6smEurMXvmmvmtyFhZ6Q4SE+dig@mail.gmail.com/T/#m21be74af263c6a34f36e0fc5c77c5449d9406925 Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07PCI: exynos: Propagate errors for optional PHYsThierry Reding
[ Upstream commit ddd6960087d4b45759434146d681a94bbb1c54ad ] devm_of_phy_get() can fail for a number of reasons besides probe deferral. It can for example return -ENOMEM if it runs out of memory as it tries to allocate devres structures. Propagating only -EPROBE_DEFER is problematic because it results in these legitimately fatal errors being treated as "PHY not specified in DT". What we really want is to ignore the optional PHYs only if they have not been specified in DT. devm_of_phy_get() returns -ENODEV in this case, so that's the special case that we need to handle. So we propagate all errors, except -ENODEV, so that real failures will still cause the driver to fail probe. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Cc: Jingoo Han <jingoohan1@gmail.com> Cc: Kukjin Kim <kgene@kernel.org> Cc: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-07PCI: imx6: Propagate errors for optional regulatorsThierry Reding
[ Upstream commit 2170a09fb4b0f66e06e5bcdcbc98c9ccbf353650 ] regulator_get_optional() can fail for a number of reasons besides probe deferral. It can for example return -ENOMEM if it runs out of memory as it tries to allocate data structures. Propagating only -EPROBE_DEFER is problematic because it results in these legitimately fatal errors being treated as "regulator not specified in DT". What we really want is to ignore the optional regulators only if they have not been specified in DT. regulator_get_optional() returns -ENODEV in this case, so that's the special case that we need to handle. So we propagate all errors, except -ENODEV, so that real failures will still cause the driver to fail probe. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Cc: Richard Zhu <hongxing.zhu@nxp.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Fabio Estevam <festevam@gmail.com> Cc: kernel@pengutronix.de Cc: linux-imx@nxp.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-07PCI: rockchip: Propagate errors for optional regulatorsThierry Reding
[ Upstream commit 0e3ff0ac5f71bdb6be2a698de0ed0c7e6e738269 ] regulator_get_optional() can fail for a number of reasons besides probe deferral. It can for example return -ENOMEM if it runs out of memory as it tries to allocate data structures. Propagating only -EPROBE_DEFER is problematic because it results in these legitimately fatal errors being treated as "regulator not specified in DT". What we really want is to ignore the optional regulators only if they have not been specified in DT. regulator_get_optional() returns -ENODEV in this case, so that's the special case that we need to handle. So we propagate all errors, except -ENODEV, so that real failures will still cause the driver to fail probe. Tested-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Shawn Lin <shawn.lin@rock-chips.com> Cc: Shawn Lin <shawn.lin@rock-chips.com> Cc: Heiko Stuebner <heiko@sntech.de> Cc: linux-rockchip@lists.infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-07PCI: tegra: Fix OF node reference leakNishka Dasgupta
[ Upstream commit 9e38e690ace3e7a22a81fc02652fc101efb340cf ] Each iteration of for_each_child_of_node() executes of_node_put() on the previous node, but in some return paths in the middle of the loop of_node_put() is missing thus causing a reference leak. Hence stash these mid-loop return values in a variable 'err' and add a new label err_node_put which executes of_node_put() on the previous node and returns 'err' on failure. Change mid-loop return statements to point to jump to this label to fix the reference leak. Issue found with Coccinelle. Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> [lorenzo.pieralisi@arm.com: rewrote commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05PCI: hv: Avoid use of hv_pci_dev->pci_slot after freeing itDexuan Cui
[ Upstream commit 533ca1feed98b0bf024779a14760694c7cb4d431 ] The slot must be removed before the pci_dev is removed, otherwise a panic can happen due to use-after-free. Fixes: 15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the driver") Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-21PCI: kirin: Fix section mismatch warningNathan Chancellor
commit 6870b673509779195cab300aedc844b352d9cfbc upstream. The PCI kirin driver compilation produces the following section mismatch warning: WARNING: vmlinux.o(.text+0x4758cc): Section mismatch in reference from the function kirin_pcie_probe() to the function .init.text:kirin_add_pcie_port() The function kirin_pcie_probe() references the function __init kirin_add_pcie_port(). This is often because kirin_pcie_probe lacks a __init annotation or the annotation of kirin_add_pcie_port is wrong. Remove '__init' from kirin_add_pcie_port() to fix it. Fixes: fc5165db245a ("PCI: kirin: Add HiSilicon Kirin SoC PCIe controller driver") Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> [lorenzo.pieralisi@arm.com: updated commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-19PCI: Always allow probing with driver_overrideAlex Williamson
commit 2d2f4273cbe9058d1f5a518e5e880d27d7b3b30f upstream. Commit 0e7df22401a3 ("PCI: Add sysfs sriov_drivers_autoprobe to control VF driver binding") introduced the sriov_drivers_autoprobe attribute which allows users to prevent the kernel from automatically probing a driver for new VFs as they are created. This allows VFs to be spawned without automatically binding the new device to a host driver, such as in cases where the user intends to use the device only with a meta driver like vfio-pci. However, the current implementation prevents any use of drivers_probe with the VF while sriov_drivers_autoprobe=0. This blocks the now current general practice of setting driver_override followed by using drivers_probe to bind a device to a specified driver. The kernel never automatically sets a driver_override therefore it seems we can assume a driver_override reflects the intent of the user. Also, probing a device using a driver_override match seems outside the scope of the 'auto' part of sriov_drivers_autoprobe. Therefore, let's allow driver_override matches regardless of sriov_drivers_autoprobe, which we can do by simply testing if a driver_override is set for a device as a 'can probe' condition. Fixes: 0e7df22401a3 ("PCI: Add sysfs sriov_drivers_autoprobe to control VF driver binding") Link: https://lore.kernel.org/lkml/155742996741.21878.569845487290798703.stgit@gimli.home Link: https://lore.kernel.org/linux-pci/155672991496.20698.4279330795743262888.stgit@gimli.home/T/#u Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-16PCI: dra7xx: Fix legacy INTD IRQ handlingVignesh R
commit 524d59f6e30aab5b618da55e604c802ccd83e708 upstream. Legacy INTD IRQ handling is broken on dra7xx due to fact that driver uses hwirq in range of 1-4 for INTA, INTD whereas IRQ domain is of size 4 which is numbered 0-3. Therefore when INTD IRQ line is used with pci-dra7xx driver following warning is seen: WARNING: CPU: 0 PID: 1 at kernel/irq/irqdomain.c:342 irq_domain_associate+0x12c/0x1c4 error: hwirq 0x4 is too large for dummy Fix this by using pci_irqd_intx_xlate() helper to translate the INTx 1-4 range into the 0-3 as done in other PCIe drivers. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Reported-by: Chris Welch <Chris.Welch@viavisolutions.com> Signed-off-by: Vignesh R <vigneshr@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>