[dpdk-dev] vfio/noiommu: Don't use iommu_present() to track fake groups

Message ID 20160122172159.5655.55830.stgit@gimli.home (mailing list archive)
State Not Applicable, archived
Headers

Commit Message

Alex Williamson Jan. 22, 2016, 5:23 p.m. UTC
  Using iommu_present() to determine whether an IOMMU group is real or
fake has some problems.  First, apparently Power systems don't
register an IOMMU on the device bus, so the groups and containers get
marked as noiommu and then won't bind to their actual IOMMU driver.
Second, I expect we'll run into the same issue as we try to support
vGPUs through vfio, since they're likely to emulate this behavior of
creating an IOMMU group on a virtual device and then providing a vfio
IOMMU backend tailored to the sort of isolation they provide, which
won't necessarily be fully compatible with the IOMMU API.

The solution here is to use the existing iommudata interface to IOMMU
groups, which allows us to easily identify the fake groups we've
created for noiommu purposes.  The iommudata we set is purely
arbitrary since we're only comparing the address, so we use the
address of the noiommu switch itself.

Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Fixes: 03a76b60f8ba ("vfio: Include No-IOMMU mode")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

Copying some DPDK folks and would appreciate validation that this
still works for the intended no-iommu use case.  Thanks!

 drivers/vfio/vfio.c |   24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)
  

Comments

Alexey Kardashevskiy Jan. 25, 2016, 12:20 a.m. UTC | #1
On 01/23/2016 04:23 AM, Alex Williamson wrote:
> Using iommu_present() to determine whether an IOMMU group is real or
> fake has some problems.  First, apparently Power systems don't
> register an IOMMU on the device bus, so the groups and containers get
> marked as noiommu and then won't bind to their actual IOMMU driver.
> Second, I expect we'll run into the same issue as we try to support
> vGPUs through vfio, since they're likely to emulate this behavior of
> creating an IOMMU group on a virtual device and then providing a vfio
> IOMMU backend tailored to the sort of isolation they provide, which
> won't necessarily be fully compatible with the IOMMU API.
>
> The solution here is to use the existing iommudata interface to IOMMU
> groups, which allows us to easily identify the fake groups we've
> created for noiommu purposes.  The iommudata we set is purely
> arbitrary since we're only comparing the address, so we use the
> address of the noiommu switch itself.
>
> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Fixes: 03a76b60f8ba ("vfio: Include No-IOMMU mode")
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>



Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Thanks!
  
Anatoly Burakov Jan. 27, 2016, 1:21 p.m. UTC | #2
Hi Alex,

> On 01/23/2016 04:23 AM, Alex Williamson wrote:
> > Using iommu_present() to determine whether an IOMMU group is real or
> > fake has some problems.  First, apparently Power systems don't
> > register an IOMMU on the device bus, so the groups and containers get
> > marked as noiommu and then won't bind to their actual IOMMU driver.
> > Second, I expect we'll run into the same issue as we try to support
> > vGPUs through vfio, since they're likely to emulate this behavior of
> > creating an IOMMU group on a virtual device and then providing a vfio
> > IOMMU backend tailored to the sort of isolation they provide, which
> > won't necessarily be fully compatible with the IOMMU API.
> >
> > The solution here is to use the existing iommudata interface to IOMMU
> > groups, which allows us to easily identify the fake groups we've
> > created for noiommu purposes.  The iommudata we set is purely
> > arbitrary since we're only comparing the address, so we use the
> > address of the noiommu switch itself.
> >
> > Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > Fixes: 03a76b60f8ba ("vfio: Include No-IOMMU mode")
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> 
> 
> 
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Tested bringing the NIC's up, encountered no issues. Curious if it also works for Santosh (CC'd) as he's one of the intended users of the No-IOMMU functionality, but otherwise seems to work.

Thanks,
Anatoly
  
Santosh Shukla Jan. 27, 2016, 1:41 p.m. UTC | #3
On Wed, Jan 27, 2016 at 6:51 PM, Burakov, Anatoly
<anatoly.burakov@intel.com> wrote:
> Hi Alex,
>
>> On 01/23/2016 04:23 AM, Alex Williamson wrote:
>> > Using iommu_present() to determine whether an IOMMU group is real or
>> > fake has some problems.  First, apparently Power systems don't
>> > register an IOMMU on the device bus, so the groups and containers get
>> > marked as noiommu and then won't bind to their actual IOMMU driver.
>> > Second, I expect we'll run into the same issue as we try to support
>> > vGPUs through vfio, since they're likely to emulate this behavior of
>> > creating an IOMMU group on a virtual device and then providing a vfio
>> > IOMMU backend tailored to the sort of isolation they provide, which
>> > won't necessarily be fully compatible with the IOMMU API.
>> >
>> > The solution here is to use the existing iommudata interface to IOMMU
>> > groups, which allows us to easily identify the fake groups we've
>> > created for noiommu purposes.  The iommudata we set is purely
>> > arbitrary since we're only comparing the address, so we use the
>> > address of the noiommu switch itself.
>> >
>> > Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> > Fixes: 03a76b60f8ba ("vfio: Include No-IOMMU mode")
>> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>>
>>
>>
>> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> Tested bringing the NIC's up, encountered no issues. Curious if it also works for Santosh (CC'd) as he's one of the intended users of the No-IOMMU functionality, but otherwise seems to work.
>

Yes, Its works for virtio dpdk case too, Tested-by:

Thanks.
> Thanks,
> Anatoly
  

Patch

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 82f25cc..ecca316 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -123,8 +123,8 @@  struct iommu_group *vfio_iommu_group_get(struct device *dev)
 	/*
 	 * With noiommu enabled, an IOMMU group will be created for a device
 	 * that doesn't already have one and doesn't have an iommu_ops on their
-	 * bus.  We use iommu_present() again in the main code to detect these
-	 * fake groups.
+	 * bus.  We set iommudata simply to be able to identify these groups
+	 * as special use and for reclamation later.
 	 */
 	if (group || !noiommu || iommu_present(dev->bus))
 		return group;
@@ -134,6 +134,7 @@  struct iommu_group *vfio_iommu_group_get(struct device *dev)
 		return NULL;
 
 	iommu_group_set_name(group, "vfio-noiommu");
+	iommu_group_set_iommudata(group, &noiommu, NULL);
 	ret = iommu_group_add_device(group, dev);
 	iommu_group_put(group);
 	if (ret)
@@ -158,7 +159,7 @@  EXPORT_SYMBOL_GPL(vfio_iommu_group_get);
 void vfio_iommu_group_put(struct iommu_group *group, struct device *dev)
 {
 #ifdef CONFIG_VFIO_NOIOMMU
-	if (!iommu_present(dev->bus))
+	if (iommu_group_get_iommudata(group) == &noiommu)
 		iommu_group_remove_device(dev);
 #endif
 
@@ -190,16 +191,10 @@  static long vfio_noiommu_ioctl(void *iommu_data,
 	return -ENOTTY;
 }
 
-static int vfio_iommu_present(struct device *dev, void *unused)
-{
-	return iommu_present(dev->bus) ? 1 : 0;
-}
-
 static int vfio_noiommu_attach_group(void *iommu_data,
 				     struct iommu_group *iommu_group)
 {
-	return iommu_group_for_each_dev(iommu_group, NULL,
-					vfio_iommu_present) ? -EINVAL : 0;
+	return iommu_group_get_iommudata(iommu_group) == &noiommu ? 0 : -EINVAL;
 }
 
 static void vfio_noiommu_detach_group(void *iommu_data,
@@ -323,8 +318,7 @@  static void vfio_group_unlock_and_free(struct vfio_group *group)
 /**
  * Group objects - create, release, get, put, search
  */
-static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
-					    bool iommu_present)
+static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group)
 {
 	struct vfio_group *group, *tmp;
 	struct device *dev;
@@ -342,7 +336,9 @@  static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 	atomic_set(&group->container_users, 0);
 	atomic_set(&group->opened, 0);
 	group->iommu_group = iommu_group;
-	group->noiommu = !iommu_present;
+#ifdef CONFIG_VFIO_NOIOMMU
+	group->noiommu = (iommu_group_get_iommudata(iommu_group) == &noiommu);
+#endif
 
 	group->nb.notifier_call = vfio_iommu_group_notifier;
 
@@ -767,7 +763,7 @@  int vfio_add_group_dev(struct device *dev,
 
 	group = vfio_group_get_from_iommu(iommu_group);
 	if (!group) {
-		group = vfio_create_group(iommu_group, iommu_present(dev->bus));
+		group = vfio_create_group(iommu_group);
 		if (IS_ERR(group)) {
 			iommu_group_put(iommu_group);
 			return PTR_ERR(group);