17 Dec 2016

PCI interrupt routing I

文章欢迎转载,但转载时请保留本段文字,并置于文章的顶部
Author: derek0883@gmail.com
本文原文地址:http://www.gorecursion.com/virtualization/2016/12/17/pci-int.html

I’m going to using QEMU to study how PCI interrupt routing works. In PCI configuration space, it has Interrup pin, and Interrupt line. How those number related to the IRQ number under /proc/interrupts? What is relation with IDT vector? what should I do, if I want change IRQ to a different value?

PCI interrupt pin and interrupt router

Each PCI device has 4 dedicate interrupt pin, show as following figure. The figure from this article. System software can config device to use which pin to generate interrupt.

PCI interrupt pin

A typical board has multiple PCI slots or device, Those PCI device connect to a PCI bridge.

Since many modules make use of only the INTA pin, if INTA were always wired to the same PIRQ line, then many devices in the system would share the same interrupt line and interrupt latency would be increased. For this reason, backplane/motherboard designers commonly alternate which PIRQ trace in the backplane is connected to each module slot INT pin (see Figure 2 for an illustration). this article.

PCI interrupt pin connection

PCI bridge will route the input of each interrupt pin to different IRQ(through an interrupt controller).

PIIX3 and e1000 in QEMU

In Build minimal kernel for QEMU, I built a kernel image, and boot with following command:

qemu-2.8.0-rc0 $ ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm \
		-kernel bzImage -hda xenial-server-cloudimg-amd64-disk1.img -nographic \
		-append "console=ttyS0 root=/dev/sda1 cloud-init=disabled"

QEMU emulate a ISA bridge PIIX3, PIIX3 has a PCI bridge. QEMU also emulate a PCI device, Intel 82540EM Gigabit Ethernet Controller.

4 interrup pin of 82540EM connect to PIRQ[A..D] of PIIX3. The following figure from PIIX3 datasheet

PCI signal of PIIX3

You can check the hardware information with lspci command.

ubuntu@ubuntu:~$ lspci 
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]    # PIIX3
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) # PCI Device

ubuntu@ubuntu:~$ lspci -s 00:03.0 -vvvvxx
00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03)
	Subsystem: Red Hat, Inc QEMU Virtual Machine
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 11
	Region 0: Memory at febc0000 (32-bit, non-prefetchable) [size=128K]
	Region 1: I/O ports at c000 [size=64]
	Expansion ROM at feb80000 [disabled] [size=256K]
	Kernel driver in use: e1000
lspci: Unable to load libkmod resources: error -12
00: 86 80 0e 10 07 01 00 00 03 00 00 02 00 00 00 00
10: 00 00 bc fe 01 c0 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 f4 1a 00 11
30: 00 00 b8 fe 00 00 00 00 00 00 00 00 0b 01 00 00

“Interrupt: pin A routed to IRQ 11” meaning that e1000 using pin A to signal interrupt, The IRQ number is 11. System software config this by writing 1-4 to PCI configuration register, register offset is 0x3d. Following figure from here

PCI configuration

There are 4 PIRQ Route control registers in PIIX3, for PinA/PinB/PinC/PinD. The following figure from PIIX3 datasheet

PCI interrupt route control register

Now we know that IRQ number is 11, So driver need write 11 to route control register. The question is, e1000 PinA connect to which PIRQRC pin here?

  1. If connect to PIRQRC[A], then need write 11 to 0x60.
  2. If connect to PIRQRC[B], then need write 11 to 0x61.
  3. If connect to PIRQRC[C], then need write 11 to 0x62.
  4. If connect to PIRQRC[D], then need write 11 to 0x63.

The answer is: hardware designer knows, BIOS enginner work closely with hardware enginner, they will build the information into BIOS, BIOS pass it to linux kernel.

Advanced Configuration and Power Interface Specification - ACPI is standard to pass those information from BIOS to kernel. Some system also use MultiProcessor Specification

dmesg shows eth0 will use Interrupt Link C route to IRQ 11, this means PCI device e1000 interrupt pin A connect to PIRQ pin C.

$ dmesg | grep "PCI.*IRQ 11" -A 4
[    7.636978] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[    7.973076] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
[    7.974592] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
[    7.978731] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[    7.979511] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.

It shows our system using ACPI to pass the information. The print come from kernel function:

// file: drivers/acpi/pci_link.c
static int acpi_pci_link_allocate(struct acpi_pci_link *link)
{
	...
        printk(KERN_WARNING PREFIX "%s [%s] enabled at IRQ %d\n",
               acpi_device_name(link->device),
               acpi_device_bid(link->device), link->irq.active);
}

// file: include/acpi/acpi_bus.h
#define acpi_device_bid(d)  ((d)->pnp.bus_id) 

Now you know why IRQ 11 is for eth0, if you use ubuntu, it will be renamed it to enp0s3.

$ cat /proc/interrupts 
           CPU0       
  0:       1598   IO-APIC   2-edge      timer
  1:          9   IO-APIC   1-edge      i8042
  4:       1359   IO-APIC   4-edge      serial
  8:          1   IO-APIC   8-edge      rtc0
  9:          0   IO-APIC   9-fasteoi   acpi
 11:        573   IO-APIC  11-fasteoi   eth0  # IRQ 11

Interrupt vector is a number between 0-255, Intel reserved 0-31. e.g. 14 is for page fault. So IRQ 11 using by e1000 here, the IDT entry is not 11. what is the vector for this device?

X86 IDT and IRQ

First step, I need to figure out the difference between IRQ number of device(11 here), and Interrupt vector table. If you don’t know what is IDT and interrupt vector, you may refer to Understanding Linux kernel and CHAPTER 6 INTERRUPT AND EXCEPTION HANDLING of Intel(R) 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A

In order answer this question, I did the following changes:

diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index feaa813..fabec65 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -441,6 +441,7 @@ int show_interrupts(struct seq_file *p, void *v)
        int i = *(loff_t *) v, j;
        struct irqaction *action;
        struct irq_desc *desc;
+       struct irq_cfg *cfg;
 
        if (i > ACTUAL_NR_IRQS)
                return 0;
@@ -472,6 +473,9 @@ int show_interrupts(struct seq_file *p, void *v)
                goto out;
 
        seq_printf(p, "%*d: ", prec, i);
+       cfg = irq_cfg(i);
+       if (cfg)
+               seq_printf(p, "%*d", prec, cfg->vector);
        for_each_online_cpu(j)
                seq_printf(p, "%10u ", kstat_irqs_cpu(i, j));
 
@@ -498,6 +502,8 @@ int show_interrupts(struct seq_file *p, void *v)
                while ((action = action->next) != NULL)
                        seq_printf(p, ", %s", action->name);
        }
+       if (desc->handle_irq)
+               seq_printf(p, "  %pf", desc->handle_irq);
 
        seq_putc(p, '\n');
 out:

Then build new kernel, and boot with QEMU again:

$ cat /proc/interrupts 
           CPU0       
  0:  48      1632   IO-APIC   2-edge      timer  handle_edge_irq
  1:  49         9   IO-APIC   1-edge      i8042  handle_edge_irq
  4:  52       430   IO-APIC   4-edge      serial  handle_edge_irq
  8:  56         1   IO-APIC   8-edge      rtc0  handle_edge_irq
  9:  57         0   IO-APIC   9-fasteoi   acpi  handle_fasteoi_irq
 11:  59        47   IO-APIC  11-fasteoi   eth0  handle_fasteoi_irq
 12:  60       125   IO-APIC  12-edge      i8042  handle_edge_irq
 14:  62         0   IO-APIC  14-edge      ata_piix  handle_edge_irq
 15:  63       126   IO-APIC  15-edge      ata_piix  handle_edge_irq

Now I know e1000 using IRQ number 11, and the interrupt vector of IDT is 59. The handling step: 1. eth0 trigger interrupt. 2. CPU read interrupt vector is 59. and using 59 as index of IDT. 3. The handler is common_interrupt in arch/x86/entry/entry_64.S 4. common_interrupt call do_IRQ in arch/x86/kernel/irq.c 5. call handle_fasteoi_irq

Why IRQ 11 mapped to vector 59.

// file: arch/x86/kernel/irqinit.c
void __init init_IRQ(void)
{   
    int i; 
    
    /*  
     * On cpu 0, Assign ISA_IRQ_VECTOR(irq) to IRQ 0..15.
     * If these IRQ's are handled by legacy interrupt-controllers like PIC,
     * then this configuration will likely be static after the boot. If
     * these IRQ's are handled by more mordern controllers like IO-APIC,
     * then this vector space can be freed and re-used dynamically as the
     * irq's migrate etc.
     */ 
    for (i = 0; i < nr_legacy_irqs(); i++)
        per_cpu(vector_irq, 0)[ISA_IRQ_VECTOR(i)] = irq_to_desc(i);
    
    x86_init.irqs.intr_init();
}       

ISA_IRQ_VECTOR will map device irq 0 to 15 to vector.

// file: arch/x86/include/asm/irq_vectors.h
/*
 * IDT vectors usable for external interrupt sources start at 0x20.
 * (0x80 is the syscall vector, 0x30-0x3f are for ISA)
 */ 
#define FIRST_EXTERNAL_VECTOR       0x20

/*          
 * Vectors 0x30-0x3f are used for ISA interrupts.
 *   round up to the next 16-vector boundary
 */ 
#define ISA_IRQ_VECTOR(irq)     (((FIRST_EXTERNAL_VECTOR + 16) & ~15) + irq)

So it is ((0x20+16) & 0xfffffff0) + 11 = 59.

Reference

In-Depth: Understanding NI Real-Time Hypervisor I/O to OS Assignments.