SCSI-Spec:
http://ldkelley.com/SCSI2/SCSI2/SCSI2-08.html#8.2.10
"An initiator may request any one or all of the supported mode pages
from a target. If an initiator issues a MODE SENSE command with a
page code value not implemented by the target, the target shall return
CHECK CONDITION status and shall set the sense key to ILLEGAL REQUEST
and the additional sense code to INVALID FIELD IN CDB."
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The block descriptor contains the number of blocks, not the highest LBA.
Real hard disks return 0 if the number of blocks exceed the maximum 0xFFFFFF.
SCSI-Spec:
http://ldkelley.com/SCSI2/SCSI2/SCSI2-08.html#8.3.3
"The number of blocks field specifies the number of logical blocks on the
medium to which the density code and block length fields apply. A value
of zero indicates that all of the remaining logical blocks of the logical
unit shall have the medium characteristics specified."
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The page control (PC) field defines the type of mode parameter values
to be returned in the mode pages:
PC=0 : Current values
PC=1 : Changeable values
PC=2 : Default values
PC=3 : Saved values
The current implementation always returns the same type of parameters.
This is OK for Current and Default values as we don't support changes
to be done by the MODE SELECT command.
For Saved values the following applies (implemented by this patch):
"A PC field value of 3h requests that the target return the saved
values of the mode parameters. Implementation of saved page parameters
is optional. Mode parameters not supported by the target shall be set
to zero. If saved values are not implemented, the command shall be
terminated with CHECK CONDITION status, the sense key set to
ILLEGAL REQUEST and the additional sense code set to
SAVING PARAMETERS NOT SUPPORTED."
For Changeable values the following applies (implemented by this patch):
"A PC field value of 1h requests that the target return a mask denoting
those mode parameters that are changeable. In the mask, the fields of
the mode parameters that are changeable shall be set to all one bits and
the fields of the mode parameters that are non-changeable (i.e. defined
by the target) shall be set to all zero bits."
In newer versions of the SCSI-2 spec the following clause was added.
"If the logical unit does not implement changeable parameters mode pages
and the device server receives a MODE SENSE command with 01b in the PC
field, then the command shall be terminated with CHECK CONDITION status,
with the sense key set to ILLEGAL REQUEST, and the additional sense code
set to INVALID FIELD IN CDB."
This was not yet included in the SCSI-2 Working Drafts from 1986-1993.
I assume that the variant to return CHECK CONDITION for PC=1 is not
widely implemented by real devices. I have a legacy OS which fails,
if MODE_SENSE returns non GOOD for PC=1. So for highest compatibility I
implemented the former variant with this patch.
The last Working Draft X3T9.2 Rev. 10L 7-SEP-93 can be found here:
http://ldkelley.com/SCSI2/SCSI2/SCSI2-08.html#8.2.10
In mode_sense_page() this patch also avoids multiple hard coded
definitions of the same mode page length. Instead I use the varable
p[1]. In fact the returned length of the mode pages 4 and 5 were wrong
(2 bytes less).
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The header for the MODE SENSE(10) command is 8 bytes long.
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The MODE DATA LENGTH field indicates the length in bytes of the following
data that is available to be transferred. The mode data length does not include
the number of bytes in the MODE DATA LENGTH field.
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Based on a patch from Mark McLoughlin, this patch introduces a new
bottom half packet transmitter that avoids the latency imposed by
the tx_timer approach. Rather than scheduling a timer when a TX
packet comes in, schedule a bottom half to be run from the iothread.
The bottom half handler first attempts to flush the queue with
notification disabled (this is where we could race with a guest
without txburst). If we flush a full burst, reschedule immediately.
If we send short of a full burst, try to re-enable notification.
To avoid a race with TXs that may have occurred, we must then
flush again. If we find some packets to send, the guest it probably
active, so we can reschedule again.
tx_timer and tx_bh are mutually exclusive, so we can re-use the
tx_waiting flag to indicate one or the other needs to be setup.
This allows us to seamlessly migrate between timer and bh TX
handling.
The bottom half handler becomes the new default and we add a new
tx= option to virtio-net-pci. Usage:
-device virtio-net-pci,tx=timer # select timer mitigation vs "bh"
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
De-couple this from the timer since we might want to use
different backends to send the packet.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If virtio_net_flush_tx() is called with notification disabled, we can
race with the guest, processing packets at the same rate as they
get produced. The trouble is that this means we have no guaranteed
exit condition from the function and can spend minutes in there.
Currently flush_tx is only called with notification on, which seems
to limit us to one pass through the queue per call. An upcoming
patch changes this.
Also add an option to set this value on the command line as different
workloads may wish to use different values. We can't necessarily
support any random value, so this is a developer option: x-txburst=
Usage:
-device virtio-net-pci,x-txburst=64 # 64 packets per tx flush
One pass through the queue (256) seems to be a good default value
for this, balancing latency with throughput. We use a signed int
for x-txburst because 2^31 packets in a burst would take many, many
minutes to process and it allows us to easily return a negative
value value from virtio_net_flush_tx() to indicate a back-off
or error condition.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add an option to make the TX mitigation timer adjustable as a device
option. The 150us hard coded default used currently is reasonable,
but may not be suitable for all workloads, this gives us a way to
adjust it using a single binary. We can't support any random option
though, so use the "x-" prefix to indicate this is a developer
option. Usage:
-device virtio-net-pci,x-txtimer=500000,... # .5ms timeout
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
make pci_parse_devfn() aware of func. With func = NULL it behave as before.
This will be used later.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
call hotplug callback even when not hotplug case for later use.
And move hotplug check into hotplug callback.
PCIE slot needs this for card presence detection.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
By making pci_add_capability() the special case of
pci_add_capability_at_offset() of offset = 0,
consolidate pci_add_capability_at_offset() into pci_add_capability().
Cc: Stefan Weil <weil@mail.berlios.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
introduce pci bridge library.
convert apb bridge and dec p2p bridge to use new pci bridge library.
save/restore is supported as a side effect.
This is also preparation for pci express root/upstream/downstream port.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Patch b0b900070c made
TOR valuer incorrect: the spec says it should always
include the CRC field.
No one seems to use this field, but better to stick to spec.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The config data field on the e500 pci controller is in little endian, so we need
to enable byte swap there.
Signed-off-by: Alexander Graf <agraf@suse.de>
The e500 PCI controller isn't qdev'ified yet. This leads to severe issues
when running with -drive.
To be able to use a virtio disk with an e500 VM, let's convert the PCI
controller over to qdev.
Signed-off-by: Alexander Graf <agraf@suse.de>
KVM on PowerPC used to have completely broken interrupt logic. Usually,
interrupts work by having a PIC that pulls a line up/down, so the CPU knows
that an interrupt is active. This line stays active until some action is
done to the PIC to release the line.
On KVM for PPC, we just checked if there was an interrupt pending and pulled
a line in the kernel module. We never released it though, hoping that kernel
space would just declare an interrupt as released when injected - which is
wrong.
To fix this, we need to completely redesign the interrupt injection logic.
Whenever an interrupt line gets triggered, we need to notify kernel space
that the line is up. Whenever it gets released, we do the same. This way
we can assure that the interrupt state is always known to kernel space.
This fixes random stalls in KVM guests on PowerPC that were waiting for
an interrupt while everyone else thought they received it already.
Signed-off-by: Alexander Graf <agraf@suse.de>
stat() fields can be more or less anything depending on configuration, cast
explicitly to uint64_t to avoid printf() format mismatches.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
There is no need to check for dest < 0 or vector >= 0 as both are
uint16_t.
This should fix problems with broken build with aggressive compiler
flags. Reported by Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Acked-by: Cam Macdonell <cam@cs.ualberta.ca>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Do not store return of get_image_size() in a uint32_t as it makes it
impossible to detect error returns from get_image_size.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
acpi table file can be modified during load so file size check
should be more strict.
pointer calculation should be after qemu_realloc(). not before realloc().
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
in_sg[].iovec and out_sg[].ioved are pointer to (source) host memory and
therefore invalid after migration. When loading the device state we must
create a new mapping on the destination host.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Separate the mapping of requests to host memory from the descriptor iteration.
The next patch will make use of it in a different context.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
https://bugs.launchpad.net/bugs/611646
reports that ./i386-softmmu/qemu -M isapc segfaults.
This patch fixes the segfault introduced by
f885f1eaa8
It's because i440fx_state in pc_init1() isn't initialized.
> Core was generated by `./i386-softmmu/qemu -M isapc'.
> Program terminated with signal 11, Segmentation fault.
> [New process 19686]
> at qemu/hw/piix_pci.c:136
> (gdb) where
> at qemu/hw/piix_pci.c:136
> boot_device=0x7fffe1f5b040 "cad", kernel_filename=0x0,
> kernel_cmdline=0x6469bf "", initrd_filename=0x0,
> cpu_model=0x654d10 "486", pci_enabled=0)
> at qemu/hw/pc_piix.c:178
> boot_device=0x7fffe1f5b040 "cad", kernel_filename=0x0,
> kernel_cmdline=0x6469bf "", initrd_filename=0x0, cpu_model=0x654d10 "486")
> at qemu/hw/pc_piix.c:207
> envp=0x7fffe1f5b188)
> at qemu/vl.c:2871
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Combining bitwise AND and logical NOT is suspicious.
Fixed by this Coccinelle script:
// From http://article.gmane.org/gmane.linux.kernel/646367
@@ expression E1,E2; @@
(
!E1 & !E2
|
- !E1 & E2
+ !(E1 & E2)
)
Acked-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
We can't use the return value of load_uimage() for the kernel because it
can't account for BSS size, and the PowerPC kernel does not relocate
blobs before zeroing BSS.
Instead, we now load at the fixed addresses chosen by u-boot (the normal
firmware for the board).
Signed-off-by: Hollis Blanchard <hollis@penguinppc.org>
The PowerPC 4xx SDRAM controller emulation unregisters RAM in its reset
callback. However, qemu_system_reset() is now called at initialization
time, so all RAM is unregistered before starting the guest (!).
Signed-off-by: Hollis Blanchard <hollis@penguinppc.org>
The message "Truncating memory to %d MiB to fit SDRAM controller limits"
should be displayed only when a user chooses an amount of RAM which
can't be represented by the PPC 4xx SDRAM controller (e.g. 129MB, which
would only be valid if the controller supports a bank size of 1MB).
Signed-off-by: Hollis Blanchard <hollis@penguinppc.org>
On KVM for PPC we need to tell the guest which instructions to use when
doing a hypercall. The clean way to do this is to go through an ioctl
from userspace and passing it on to the guest using the device tree.
So let's do the qemu part here: read out the hypercall and pass it on
to the guest's fw_cfg so openBIOS can read it out and expose it again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Changing block.h or blockdev.h resulted in recompiling most objects.
Move DriveInfo typedef and BlockInterfaceType enum definitions
to qemu-common.h and rearrange blockdev.h use to decrease churn.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Switch tree to lookup-by-name using qemu_find_opts().
Also hook up virtfs options so qemu_find_opts works for them too.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Old versions of the BOCHs VGA BIOS (cira 2003) made use of VBE
registers at 0xff80/81. In VBE API version 0xb0c2 these were
moved to 0x1ce/cf. Unfortunately, QEMU still registers handlers
for the old range. If a guest attempts to assign an I/O device
overlapping this region, QEMU exits with a hw_error. Windows
guests seem to like to assign I/O devices to the high end of
the address space, so it's pretty easy to hot add an rtl8139
to a Win2k8 guest and trigger the bug. I can't find any reason
to register these handlers, so let's remove the cruft.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Free malloc'ed memory, unregister from savevm and clean up virtio-common
bits on device hot-unplug.
This was found performing a migration after device hot-unplug.
Reported-by: <lihuang@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
I have a guest OS which sends the command 0xfd to the keyboard
controller during initialization. To get rid of the message
"qemu: unsupported keyboard cmd=0x%02x\n" I added support for
the pulse output bit commands.
I found the following explanation here:
http://www.win.tue.nl/~aeb/linux/kbd/scancodes-11.html#ss11.3
Command 0xf0-0xff: Pulse output bit
Bits 3-0 of the output port P2 of the keyboard controller may
be pulsed low for approximately 6 µseconds. Bits 3-0 of this
command specify the output port bits to be pulsed. 0: Bit should
be pulsed. 1: Bit should not be modified. The only useful version
of this command is Command 0xfe.
(For MCA, replace 3-0 by 1-0 in the above.)
Command 0xfe: System reset
Pulse bit 0 of the output port P2 of the keyboard controller.
This will reset the CPU.
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Replace a qemu_malloc call, followed by a memset, with qemu_mallocz.
Found with this Coccinelle semantic patch, adapted from
Coccinelle test package rule 94:
@@
type T;
expression x;
expression E;
@@
- x = (T)qemu_malloc(E)
+ x = qemu_mallocz(E)
...
(
- memset(x,0,E);
|
- memset(x,0,sizeof(*x));
)
Some files (tests/*) had to be filtered out.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
According to scc_escc_um.pdf:
- Reset Highest IUS must update irq status to allow processing
of the next priority interrupt.
- rx interrupt has always higher priority than tx on same channel
The documentation only explicitly says that Reset Highest IUS
command (0x38) clears IUS bits, not that it clears the corresponding
interrupt too, so don't clear interrupts on this command.
The patch allows SunOS 4.1.4 to use the serial ports
Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
resend for bug fix related to removal of irqfd
Support an inter-vm shared memory device that maps a shared-memory object as a
PCI device in the guest. This patch also supports interrupts between guest by
communicating over a unix domain socket. This patch applies to the qemu-kvm
repository.
-device ivshmem,size=<size in format accepted by -m>[,shm=<shm name>]
Interrupts are supported between multiple VMs by using a shared memory server
by using a chardev socket.
-device ivshmem,size=<size in format accepted by -m>[,shm=<shm name>]
[,chardev=<id>][,msi=on][,ioeventfd=on][,vectors=n][,role=peer|master]
-chardev socket,path=<path>,id=<id>
The shared memory server, sample programs and init scripts are in a git repo here:
www.gitorious.org/nahanni
Signed-off-by: Cam Macdonell <cam@cs.ualberta.ca>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
A non-migratable device should be removed before migration and re-added after.
Signed-off-by: Cam Macdonell <cam@cs.ualberta.ca>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The reason for not actually canceling the I/O is because with
virtualization and lots of VM running, a guest fs may mistake a
overload of the host, as an IDE timeout. So rather than canceling the
I/O, it's safer to wait I/O completion and simulate that the I/O has
completed just before the io cancellation was requested by the
guest. This way if ntfs or an app writes data without checking for
-EIO retval, and it thinks the write has succeeded, it's less likely
to run into troubles. Similar issues for reads.
Furthermore because the DMA operation is splitted into many synchronous
aio_read/write if there's more than one entry in the SG table, without this
patch the DMA would be cancelled in the middle, something we've no idea if it
happens on real hardware too or not. Overall this seems a great risk for zero
gain.
This approach is sure safer than previous code given we can't pretend all guest
fs code out there to check for errors and reply the DMA if it was completed
partially, given a timeout would never materialize on a real harddisk unless
there are defective blocks (and defective blocks are practically only an issue
for reads never for writes in any recent hardware as writing to blocks is the
way to fix them) or the harddisk breaks as a whole.
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The timer #0 is the system timer, so the timer #num_cpu is the
timer of the last CPU, and it must be initialized in slavio_timer_reset.
Don't mark non-existing timers as running.
Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This header is not present on my system and causes a build
failure, but is also not used in these files, so remove it.
Signed-off-by: Mike McCormack <mikem@ring3k.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>