Discussion:
Can someone help me understand the reason for this code in ib_isert.c?
Chris Moore
2014-10-20 15:29:59 UTC
Permalink
The following code is in isert_conn_setup_qp() in ib_isert.c:

/*
* FIXME: Use devattr.max_sge - 2 for max_send_sge as
* work-around for RDMA_READ..
*/
attr.cap.max_send_sge = device->dev_attr.max_sge - 2;

It's not clear from the comment what this is a work-around for, and I wasn't able
to figure it out from looking at logs.

In trying to get isert working with ocrdma I ran into a problem where the code
thought there was only 1 send SGE available when in fact the device has 3.
Then I found the above work-around, which explained the problem.

Removing this (setting attr.cap.max_send_sge to device->dev_attr.max_sge)
fixes the problem with isert and ocrdma, but I don't know what other side effects
that might have.

Thanks,

Chris

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Or Gerlitz
2014-10-20 21:13:22 UTC
Permalink
Post by Chris Moore
/*
* FIXME: Use devattr.max_sge - 2 for max_send_sge as
* work-around for RDMA_READ..
*/
attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
It's not clear from the comment what this is a work-around for, and I wasn't able
to figure it out from looking at logs.
I believe this refers to some IBTA spec corner case which comes into
play with the max_sges advertized by mlx4, Eli, can you shed some
light (IBTA pointer) on that? is this the case (i.e dev_attr.max_sge
isn't always achievable) with mlx5 too?
Post by Chris Moore
In trying to get isert working with ocrdma I ran into a problem where the code
thought there was only 1 send SGE available when in fact the device has 3.
Then I found the above work-around, which explained the problem.
Nic, any reason for attr.cap.max_send_sge = 1 not to work OK?
Post by Chris Moore
Removing this (setting attr.cap.max_send_sge to device->dev_attr.max_sge)
fixes the problem with isert and ocrdma, but I don't know what other side effects
that might have.
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger
2014-10-22 05:06:37 UTC
Permalink
Post by Or Gerlitz
Post by Chris Moore
/*
* FIXME: Use devattr.max_sge - 2 for max_send_sge as
* work-around for RDMA_READ..
*/
attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
It's not clear from the comment what this is a work-around for, and I wasn't able
to figure it out from looking at logs.
I believe this refers to some IBTA spec corner case which comes into
play with the max_sges advertized by mlx4, Eli, can you shed some
light (IBTA pointer) on that? is this the case (i.e dev_attr.max_sge
isn't always achievable) with mlx5 too?
It's for ConnectX-2 reporting max_sge=31 during development, while the
largest max_send_sge for stable large block RDMA reads was (is..?) 29
SGEs.
Post by Or Gerlitz
Post by Chris Moore
In trying to get isert working with ocrdma I ran into a problem where the code
thought there was only 1 send SGE available when in fact the device has 3.
Then I found the above work-around, which explained the problem.
Nic, any reason for attr.cap.max_send_sge = 1 not to work OK?
IIRC, pre fastreg code supported max_send_sge=1 by limiting the transfer
size per RDMA read, and would issue subsequent RDMA read + offset from
completion up to the total se_cmd->data_length.

Not sure how this works with fastreg today. Sagi..?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Or Gerlitz
2014-10-22 09:02:34 UTC
Permalink
Post by Nicholas A. Bellinger
Post by Or Gerlitz
Post by Chris Moore
/*
* FIXME: Use devattr.max_sge - 2 for max_send_sge as
* work-around for RDMA_READ..
*/
attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
It's not clear from the comment what this is a work-around for, and I wasn't able
to figure it out from looking at logs.
I believe this refers to some IBTA spec corner case which comes into
play with the max_sges advertized by mlx4, Eli, can you shed some
light (IBTA pointer) on that? is this the case (i.e dev_attr.max_sge
isn't always achievable) with mlx5 too?
It's for ConnectX-2 reporting max_sge=31 during development, while the
largest max_send_sge for stable large block RDMA reads was (is..?) 29
SGEs.
Post by Or Gerlitz
Post by Chris Moore
In trying to get isert working with ocrdma I ran into a problem where the code
thought there was only 1 send SGE available when in fact the device has 3.
Then I found the above work-around, which explained the problem.
Nic, any reason for attr.cap.max_send_sge = 1 not to work OK?
IIRC, pre fastreg code supported max_send_sge=1 by limiting the transfer
size per RDMA read, and would issue subsequent RDMA read + offset from
completion up to the total se_cmd->data_length.
Not sure how this works with fastreg today. Sagi..?
While on fastreg mode, for RDMA reads we use only one SGE, talking to
Sagi he explained me that the problem with creating the QP with
max_send_sge = 1 has to do with other flows where two or even three SGEs
are needed, e.g when the iscsi/iser header (doesn't exist in RDMA ops)
is taken from one spot of the memory and the data (sense) taken from
another one?

Nic, we need to see what is the minimum needed by the code (two?) and
tweak that line to make sure it gets there as long as the device
supports it, SB, makes sense?


diff --git a/drivers/infiniband/ulp/isert/ib_isert.c
b/drivers/infiniband/ulp/isert/ib_isert.c
index 0bea577..24abbb3 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -115,9 +115,10 @@ isert_conn_setup_qp(struct isert_conn *isert_conn,
struct rdma_cm_id *cma_id,
attr.cap.max_recv_wr = ISERT_QP_MAX_RECV_DTOS;
/*
* FIXME: Use devattr.max_sge - 2 for max_send_sge as
- * work-around for RDMA_READ..
+ * work-around for RDMA_READ.. still need to make sure to
+ * have at least two SGEs for SCSI responses.
*/
- attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
+ attr.cap.max_send_sge = max(2, device->dev_attr.max_sge - 2);
isert_conn->max_sge = attr.cap.max_send_sge;

attr.cap.max_recv_sge = 1;


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger
2014-10-22 21:50:01 UTC
Permalink
Post by Or Gerlitz
Post by Nicholas A. Bellinger
Post by Or Gerlitz
Post by Chris Moore
/*
* FIXME: Use devattr.max_sge - 2 for max_send_sge as
* work-around for RDMA_READ..
*/
attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
It's not clear from the comment what this is a work-around for, and I wasn't able
to figure it out from looking at logs.
I believe this refers to some IBTA spec corner case which comes into
play with the max_sges advertized by mlx4, Eli, can you shed some
light (IBTA pointer) on that? is this the case (i.e dev_attr.max_sge
isn't always achievable) with mlx5 too?
It's for ConnectX-2 reporting max_sge=31 during development, while the
largest max_send_sge for stable large block RDMA reads was (is..?) 29
SGEs.
Post by Or Gerlitz
Post by Chris Moore
In trying to get isert working with ocrdma I ran into a problem where the code
thought there was only 1 send SGE available when in fact the device has 3.
Then I found the above work-around, which explained the problem.
Nic, any reason for attr.cap.max_send_sge = 1 not to work OK?
IIRC, pre fastreg code supported max_send_sge=1 by limiting the transfer
size per RDMA read, and would issue subsequent RDMA read + offset from
completion up to the total se_cmd->data_length.
Not sure how this works with fastreg today. Sagi..?
While on fastreg mode, for RDMA reads we use only one SGE, talking to
Sagi he explained me that the problem with creating the QP with
max_send_sge = 1 has to do with other flows where two or even three SGEs
are needed, e.g when the iscsi/iser header (doesn't exist in RDMA ops)
is taken from one spot of the memory and the data (sense) taken from
another one?
Nic, we need to see what is the minimum needed by the code (two?) and
tweak that line to make sure it gets there as long as the device
supports it, SB, makes sense?
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c
b/drivers/infiniband/ulp/isert/ib_isert.c
index 0bea577..24abbb3 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -115,9 +115,10 @@ isert_conn_setup_qp(struct isert_conn *isert_conn,
struct rdma_cm_id *cma_id,
attr.cap.max_recv_wr = ISERT_QP_MAX_RECV_DTOS;
/*
* FIXME: Use devattr.max_sge - 2 for max_send_sge as
- * work-around for RDMA_READ..
+ * work-around for RDMA_READ.. still need to make sure to
+ * have at least two SGEs for SCSI responses.
*/
- attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
+ attr.cap.max_send_sge = max(2, device->dev_attr.max_sge - 2);
isert_conn->max_sge = attr.cap.max_send_sge;
attr.cap.max_recv_sge = 1;
After a quick audit, the minimum max_send_sge=2 patch looks correct for
the hardcoded tx_desc->num_sge cases in isert_put_login_tx(),
isert_put_response(), isert_put_reject() and isert_put_text_rsp().

For the RDMA read path with non fast-reg, isert_build_rdma_wr() is still
correctly limiting the SGEs per operation based upon the negotiated
isert_conn->max_sge set above in isert_conn_setup_qp().

Chris, please confirm on ocrdma with Or's patch, and I'll include his
patch into target-pending/queue for now, and move into master once you
give a proper Tested-By.

Thanks!

--nab
Sagi Grimberg
2014-10-22 11:39:33 UTC
Permalink
Post by Nicholas A. Bellinger
Post by Or Gerlitz
Post by Chris Moore
/*
* FIXME: Use devattr.max_sge - 2 for max_send_sge as
* work-around for RDMA_READ..
*/
attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
It's not clear from the comment what this is a work-around for, and I wasn't able
to figure it out from looking at logs.
I believe this refers to some IBTA spec corner case which comes into
play with the max_sges advertized by mlx4, Eli, can you shed some
light (IBTA pointer) on that? is this the case (i.e dev_attr.max_sge
isn't always achievable) with mlx5 too?
It's for ConnectX-2 reporting max_sge=31 during development, while the
largest max_send_sge for stable large block RDMA reads was (is..?) 29
SGEs.
Hmm, long time since I've worked with CX2...
I'll check that.
Post by Nicholas A. Bellinger
Post by Or Gerlitz
Post by Chris Moore
In trying to get isert working with ocrdma I ran into a problem where the code
thought there was only 1 send SGE available when in fact the device has 3.
Then I found the above work-around, which explained the problem.
Nic, any reason for attr.cap.max_send_sge = 1 not to work OK?
IIRC, pre fastreg code supported max_send_sge=1 by limiting the transfer
size per RDMA read, and would issue subsequent RDMA read + offset from
completion up to the total se_cmd->data_length.
The non-fastreg code logic should have stayed the same unless I
got a bug in there...

Chris, can you get us logs with debug enabled?
Post by Nicholas A. Bellinger
Not sure how this works with fastreg today. Sagi..?
First, fastreg is used only if signature is enabled.
Second, since isert registers the memory (single {key, address, length}
tuple describes the page list) it will use *only 1* sge in the RDMA (no
need for more...), so no problem here.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Eli Cohen
2014-10-23 07:43:47 UTC
Permalink
Post by Or Gerlitz
Post by Chris Moore
/*
* FIXME: Use devattr.max_sge - 2 for max_send_sge as
* work-around for RDMA_READ..
*/
attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
It's not clear from the comment what this is a work-around for, and I wasn't able
to figure it out from looking at logs.
I believe this refers to some IBTA spec corner case which comes into
play with the max_sges advertized by mlx4, Eli, can you shed some
light (IBTA pointer) on that? is this the case (i.e dev_attr.max_sge
isn't always achievable) with mlx5 too?
I don't think it has to do anything with some corner case. If you look
at the spec you will find that is not guarnteed that whenever you
create a QP with device->dev_attr.max_sge it will succeed. The
consumer should try smaller values if it cannot create a QP with max
sge. This is from IB spec 1.2.1.

11.2.1.2 QUERY HCA
Description:
Returns the attributes for the specified HCA. The maximum values
defined in this section are guaranteed not-to-exceed values. It is
possible for an implementation to allocate some HCA resources from the
same space. In that case, the maximum values returned are not
guaranteed for all of those resources simultaneously.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...