summaryrefslogtreecommitdiff
path: root/xen/arch/x86/numa.c
AgeCommit message (Collapse)Author
2021-09-24x86: initialize memnodemapsize while faking NUMA nodeWei Chen
When system turns NUMA off or system lacks of NUMA support, Xen will fake a NUMA node to make system works as a single node NUMA system. In this case the memory node map doesn't need to be allocated from boot pages, it will use the _memnodemap directly. But memnodemapsize hasn't been set. Xen should assert in phys_to_nid. Because x86 was using an empty macro "VIRTUAL_BUG_ON" to replace ASSERT, this bug will not be triggered on x86. Actually, Xen will only use 1 slot of memnodemap in this case. So we set memnodemap[0] to 0 and memnodemapsize to 1 in this patch to fix it. Signed-off-by: Wei Chen <wei.chen@arm.com> Acked-by: Jan Beulich <jbeulich@suse.com>
2020-02-14add a domain_tot_pages() helper functionPaul Durrant
This patch adds a new domain_tot_pages() inline helper function into sched.h, which will be needed by a subsequent patch. No functional change. NOTE: While modifying the comment for 'tot_pages' in sched.h this patch makes some cosmetic fixes to surrounding comments. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: George Dunlap <George.Dunlap@eu.citrix.com> Acked-by: Julien Grall <julien@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
2020-02-03xen: split parameter related definitions in own header fileJuergen Gross
Move the parameter related definitions from init.h into a new header file param.h. This will avoid include hell when new dependencies are added to parameter definitions. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Julien Grall <julien@xen.org> Acked-by: Dario Faggioli <dfaggioli@suse.com> Acked-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
2019-06-27nodemask: Don't opencode cycle_node()Andrew Cooper
No functional change. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
2018-11-19xen/keyhandler: Drop keyhandler_scratchAndrew Cooper
With almost all users of keyhandler_scratch gone, clean up the 3 remaining users and drop the buffer. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
2018-09-19Change timestamps representation for keyhandlersAndrii Anisov
For different keyhandlers, replace a hex with delimiter representation of time to PRI_stime which is decimal ns currently. Signed-off-by: Andrii Anisov <andrii_anisov@epam.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
2018-09-11xen: Fix inconsistent callers of panic()Andrew Cooper
Callers are inconsistent with whether they pass a newline to panic(), including adjacent calls in the same function using different styles. painc() not expecting a newline is inconsistent with most other printing functions, which is most likely why we've gained so many inconsistencies. Switch panic() to expect a newline, and update all callers which currently lack a newline to include one. This actually reduces the size of .rodata (0x07e3e8 down to 0x07e3a8) because a number of strings are passed to both panic() and printk(). As they previously differed by \n alone, they couldn't be merged. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com>
2018-04-06xen: Convert page_to_mfn and mfn_to_page to use typesafe MFNJulien Grall
Most of the users of page_to_mfn and mfn_to_page are either overriding the macros to make them work with mfn_t or use mfn_x/_mfn because the rest of the function use mfn_t. So make page_to_mfn and mfn_to_page return mfn_t by default. The __* version are now dropped as this patch will convert all the remaining non-typesafe callers. Only reasonable clean-ups are done in this patch. The rest will use _mfn/mfn_x for the time being. Lastly, domain_page_to_mfn is also converted to use mfn_t given that most of the callers are now switched to _mfn(domain_page_to_mfn(...)). Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
2017-09-18mm: use typesafe MFN for alloc_boot_pages returnJulien Grall
At the moment, most of the callers will have to use mfn_x. However follow-up patches will remove some of them by propagating the typesafe a bit further. Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2017-08-25xen/arch/x86/numa.c: let custom parameter parsing routines return errnoJuergen Gross
Modify the custom parameter parsing routines in: xen/arch/x86/numa.c to indicate whether the parameter value was parsed successfully. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
2017-08-16x86/numa: don't check alloc_boot_pages returnJulien Grall
alloc_boot_pages will panic if it is not possible to allocate. So the check in the caller is pointless. Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
2017-07-04x86/numa.c: use plain boolWei Liu
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2016-08-11x86/NUMA: cleanupJan Beulich
- drop the only left CONFIG_NUMA conditional (this is always true) - drop struct node_data's node_id field (being always equal to the node_data[] array index used) - don't open code node_{start,end}_pfn() nor node_spanned_pages() except when used as lvalues (those could be converted too, but this seems a little awkward) - no longer open code pfn_to_paddr() in an expression being modified anyway - make dump less verbose by logging actual vs intended node IDs only when they don't match Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2016-08-11page-alloc/x86: don't restrict DMA heap to node 0Jan Beulich
When node zero has no memory, the DMA bit width will end up getting set to 9, which is obviously not helpful to hold back a reasonable amount of low enough memory for Dom0 to use for DMA purposes. Find the lowest node with memory below 4Gb instead. Introduce arch_get_dma_bitsize() to keep this arch-specific logic out of common code. Also adjust the original calculation: I think the subtraction of 1 should have been part of the flsl() argument rather than getting applied to its result. And while previously the division by 4 was valid to be done on the flsl() result, this now also needs to be converted, as is should only be applied to the spanned pages value. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2015-12-16drop empty __cpuinit annotationAndrew Cooper
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
2015-12-03x86: __{cpu,dev}initdata drop follow-upJan Beulich
While reviewing those patches I noticed a few types that could do with tweaking. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2015-11-30drop empty __devinitdata annotationAndrew Cooper
x86 is the only architecture which uses __devinitdata, and also has CONFIG_HOTPLUG enabled, making the annotation empty. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
2015-11-30drop empty __cpuinitdata annotationAndrew Cooper
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
2015-10-14x86/NUMA: fix SRAT table processor entry parsing and consumptionJan Beulich
- don't overrun apicid_to_node[] (possible in the x2APIC case) - don't limit number of processor related SRAT entries we can consume - make acpi_numa_{processor,x2apic}_affinity_init() as similar to one another as possible - print APIC IDs in hex (to ease matching with other log messages), at once making legacy and x2APIC ones distinguishable (by width) Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2015-09-25keyhandler: rework keyhandler infrastructureAndrew Cooper
struct keyhandler does not contain much information, and requires a lot of boilerplate to use. It is far more convenient to have register_keyhandler() take each piece of information a parameter, especially when introducing temporary debugging keyhandlers. This in turn allows struct keyhandler itself to become private to keyhandler.c and for the key_table to become more efficient. key_table doesn't need to contain 256 entries; all keys are ASCII which limits them to 7 bits of index, rather than 8. It can also become a straight array, rather than an array of pointers. The overall effect of this is the key_table grows in size by 50%, but there are no longer 24-byte keyhandler structures all over the data section. All of the key_table entries in keyhandler.c can be initialised at compile time rather than runtime. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
2015-09-23x86: shorten debug key 'u' outputJan Beulich
... by grouping sequences of contiguous CPUs. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2015-02-26x86/numa: adjust datatypes for node and pxmBoris Ostrovsky
Use u8-sized node IDs and unsigned PXMs consistently throughout code (and introduce nodeid_t type). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
2015-02-17x86: dump vNUMA information with debug key 'u'Elena Ufimsteva
Signed-off-by: Elena Ufimsteva <ufimtseva@gmail.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
2014-09-04numa.c: convert to xen coding styleElena Ufimtseva
Convert to Xen coding style from mixed one. Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
2012-09-12xen: Remove x86_32 build target.Keir Fraser
Signed-off-by: Keir Fraser <keir@xen.org>
2012-06-06xen: enhance dump_numa outputDario Faggioli
By showing the number of free pages on each node. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
2011-11-08eliminate cpu_set()Jan Beulich
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2011-10-21eliminate direct assignments of CPU masksJan Beulich
Use cpumask_copy() instead of direct variable assignments for copying CPU masks. While direct assignments are not a problem when both sides are variables actually defined as cpumask_t (except for possibly copying *much* more than would actually need to be copied), they must not happen when the original variable is of type cpumask_var_t (which may have lass space allocated to it than a full cpumask_t). Eliminate as many of such assignments as possible (in several cases it's even possible to collapse two operations [copy then clear one bit] into one [cpumask_andnot()]), and thus set the way for reducing the allocation size in alloc_cpumask_var(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
2011-10-21introduce and use nr_cpu_ids and nr_cpumask_bitsJan Beulich
The former is the runtime equivalent of NR_CPUS (and users of NR_CPUS, where necessary, get adjusted accordingly), while the latter is for the sole use of determining the allocation size when dynamically allocating CPU masks (done later in this series). Adjust accessors to use either of the two to bound their bitmap operations - which one gets used depends on whether accessing the bits in the gap between nr_cpu_ids and nr_cpumask_bits is benign but more efficient. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
2011-04-06Remove __init specifier from function declarations in header files.Keir Fraser
The specifier only needs to be added to the function's definition. At the same time, fix init_cpu_to_node() to be __init rather than __devinit (it is only called at boot time). Signed-off-by: Keir Fraser <keir@xen.org>
2011-03-23Define new <pfn.h> header for PFN_{DOWN,UP} macros.Keir Fraser
Signed-off-by: Keir Fraser <keir@xen.org>
2011-03-09move various bits into .init.* sectionsJan Beulich
This also includes the removal of some entirely unused functions. The patch builds upon the makefile adjustments done in the earlier sent patch titled "move more kernel decompression bits to .init.* sections". Signed-off-by: Jan Beulich <jbeulich@novell.com>
2010-07-28Walking the page lists needs the page_alloc lockKeir Fraser
There are a few places in Xen where we walk a domain's page lists without holding the page_alloc lock. They race with updates to the page lists, which are normally rare but can be quite common under PoD when the domain is close to its memory limit and the PoD reclaimer is busy. This patch protects those places by taking the page_alloc lock. I think this is OK for the two debug-key printouts - they don't run from irq context and look deadlock-free. The tboot change seems safe too unless tboot shutdown functions are called from irq context or with the page_alloc lock held. The p2m one is the scariest but there are already code paths in PoD that take the page_alloc lock with the p2m lock held so it's no worse than existing code. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
2010-06-11x86: Do not include apic.h/io_apic.h from asm/smp.hKeir Fraser
...and fix up the ensuing fall-out of implicit dependencies Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
2010-02-25x86 numa: Fix i386 to not do bogus mfn_to_virt(alloc_boot_pages(...))Keir Fraser
Signed-off-by: Jan Beulich <jbeulich@novell.com>
2010-01-08x86: fix NUMA handling (c/s 20599:e5a757ce7845)Keir Fraser
c/s 20599 caused the hash shift to become significantly smaller on systems with an SRAT like this (XEN) SRAT: Node 0 PXM 0 0-a0000 (XEN) SRAT: Node 0 PXM 0 100000-80000000 (XEN) SRAT: Node 1 PXM 1 80000000-d0000000 (XEN) SRAT: Node 1 PXM 1 100000000-130000000 Comined with the static size of the memnodemap[] array, NUMA got therefore disabled on such systems. The backport from Linux was really incomplete, as Linux much earlier had already introduced a dynamcially allocated memnodemap[]. Further, doing to/from pdx translations on addresses just past a valid range is not correct, as it may strip/fail to insert non-zero bits in this case. Finally, using 63 as the cover-it-all shift value is invalid on 32bit, since pdx values are unsigned long. Signed-off-by: Jan Beulich <jbeulich@novell.com>
2010-01-05numa: Correct handling node with CPU populated but no memory populatedKeir Fraser
In changeset 20599, the node that has no memory populated is marked parsed, but not online. However, if there are CPU populated in this node, the corresponding CPU mapping (i.e. the cpu_to_node) is still setup to the offline node, this will cause trouble for memory allocation. This patch changes the init_cpu_to_node() and srant_detect_node(), to considering the node is offlined situation. Now the apicid_to_node is only used to keep the mapping between cpu/node provided by BIOS, and should not be used for memory allocation anymore. One thing left is to update the cpu_to_node mapping after memory populated by memory hot-add. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> This is a reintroduction of 20726:ddb8c5e798f9, which I incorrectly reverted in 20745:d3215a968db9 Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
2010-01-04Revert 20726:ddb8c5e798f9Keir Fraser
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
2009-12-28numa: Correct handling node with CPU populated but no memory populatedKeir Fraser
In changeset 20599, the node that has no memory populated is marked parsed, but not online. However, if there are CPU populated in this node, the corresponding CPU mapping (i.e. the cpu_to_node) is still setup to the offline node, this will cause trouble for memory allocation. This patch changes the init_cpu_to_node() and srant_detect_node(), to considering the node is offlined situation. Now the apicid_to_node is only used to keep the mapping between cpu/node provided by BIOS, and should not be used for memory allocation anymore. One thing left is to update the cpu_to_node mapping after memory populated by memory hot-add. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
2009-12-09SRAT memory hotplug 2/2: Support overlapped and sparse node memory arrangement.Keir Fraser
Currently xen hypervisor use nodes to keep start/end address of node. It assume memory among nodes has no overlap, this is not always true, especially if we have memory hotplug support in the system. This patch backport Linux kernel's memblks to support overlapping among node. The memblks will be used both for checking conflict, and caculate memnode_shift. Also, currently if there is no memory populated in a node when system booting, the node will be unparsed later, and the corresponding CPU's numa information will be removed also. This patch will keep the CPU information. One thing need notice is, currently we caculate memnode_shift with all memory, including un-populated ones. This should work if the smallest chuck is not so small. Other option can be flags in the page_info structure, etc. The memnodemap is changed from paddr to pdx, both to save space, and also because currently most access is from pfn. A flag is mem_hotplug added if there is hotplug memory range. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
2009-12-01xen: turn numa=on by defaultKeir Fraser
I did some benchmark runs (lmbench & kernel compile) with a number of guests running in parallel to compare the performance of numa=on vs. numa=off. As soon as one starts to load the machine, the performance goes down in the numa=off case. The tests were done on an 8-node machine (4 cores each). lmbench (actually copying large amounts of memory) shows a dramatic dropdown, but I even noticed significant performance decrease for a tmpfs based Linux kernel compile. Here a summary of the data: lmbench's rd benchmark (normalized to native Linux (=100)): guests numa=off numa=on avg increase min avg max min avg max 1 78.0 102.3 7 37.4 45.6 62.0 90.6 102.3 110.9 124.4% 15 21.0 25.8 31.7 41.7 48.7 54.1 88.2% 23 13.4 17.5 23.2 25.0 28.0 30.1 60.2% kernel compile in tmpfs, 1 VCPU, 2GB RAM, average of elapsed time: guests numa=off numa=on increase 1 480.610 464.320 3.4% 7 482.109 461.721 4.2% 15 515.297 477.669 7.3% 23 548.427 495.180 9.7% again with 2 VCPUs and make -j2: 1 264.580 261.690 1.1% 7 279.763 258.907 7.7% 15 330.385 272.762 17.4% 23 463.510 390.547 15.7% (46 VCPUs on 32pCPUs) Selected tests on a 4-node machine showed similar behavior (7.9 % increase with 6 parallel guests on the 2 VCPU kernel compile benchmark). Note that this does not affect non-NUMA machines at all, since NUMA will be turned off again by the code if no NUMA topology is detected. Signed-off-by: Andre Przywara <andre.przywara@amd.com>
2009-11-12Support physical CPU hot-add in xen hypervisorKeir Fraser
This patch add CPU hot-add in system. a) It mark all CPU as possible when booting, if CONFIG_HOTPLUG_CPU is set. BTW, this will increase per_cpu area. b) When a CPU is added through hypercall, the CPU will be marked as present and offline, and the numa information is setup if numa is supported. The CPU will be brought to online by dom0 online explicitly. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
2009-10-28Miscellaneous data placement adjustmentsKeir Fraser
Make various data items const or __read_mostly where possible/reasonable. Signed-off-by: Jan Beulich <jbeulich@novell.com>
2009-08-02Add a single trigger for all diagnostic keyhandlersKeir Fraser
Add a new keyhandler that triggers all the side-effect-free keyhandlers. This lets automated tests (and users) log the full set of keyhandlers without having to be aware of which ones might reboot the host. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
2009-04-23x86 numa: Fix left shift overflowsKeir Fraser
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
2009-01-30x86-64: use MFNs for linking together pages on listsKeir Fraser
Unless more than 16Tb are going to ever be supported in Xen, this will allow reducing the linked list entries in struct page_info from 16 to 8 bytes. This doesn't modify struct shadow_page_info, yet, so in order to meet the constraints of that 'mirror' structure the list entry gets artificially forced to be 16 bytes in size. That workaround will be removed in a subsequent patch. Signed-off-by: Jan Beulich <jbeulich@novell.com>
2008-08-05x86: debug key prints memory node info of each domainKeir Fraser
This patch will collect memory location (the domain has how many pages in different node) of each domain and display if you input debug key. Signed-off-by: Zhou Ting <ting.g.zhou@intel.com>
2008-05-01x86: Make apicid 32 bits in preparation for x2APIC support.Keir Fraser
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
2008-03-17x86 numa: Fix the overflow of physical addresses.Keir Fraser
If memory address >4G, the address will overflow in some NUMA code if using unsigned long to statement a physical address in PAE arch. Replace "unsigned long" with paddr_t to avoid overflow. Signed-off-by: Duan Ronghui <ronghui.duan@intel.com>
2007-09-14x86: fix NUMA code for 32bitkfraser@localhost.localdomain
I don't know how significant this is (most of the NUMA node data seems unused at this point), but anyway: enable proper operation of NUMA emulation and the fake NUMA node in case there's no SRAT table on x86-32. This will at least make the "Faking node ..." message not print confusing information anymore. Signed-off-by: Jan Beulich <jbeulich@novell.com>