Discussion:
aleph and HttpObjectAggregator to limit response size?
Reynald Borer
2018-06-14 12:50:50 UTC
Permalink
Hello friends,

I'l trying to use HttpObjectAggregator channel handler with aleph and I
can't get it to work. As soon as I enable it (through the
pipeline-transform below) I get the following exception:

TimeoutException timed out after 30000 milliseconds
manifold.deferred/timeout!/fn--1815 (deferred.clj:1160)

Here is the pipeline-transform function that I use:

(defn- configure-pipeline-fn
[max-content-length decompression?]
(fn [^ChannelPipeline pipeline]
; aggregate content and limit to max-content-length bytes
(.addAfter pipeline "http-client" "aggregator" (HttpObjectAggregator. max-content-length))

(if decompression?
(.addAfter pipeline "http-client" "deflater" (HttpContentDecompressor.)))))


Context:

I'm currently migrating https://paper.li/ crawler built on top of Netty
3.10 to aleph. Netty 3.10 integration was built by my predecessors and I
don't have enough Netty knowledge to migrate it to version 4.1. Since our
platform is already tightly coupled with manifold (great library btw), I
feel it make sense to migrate to aleph too :-)

I would like to use HttpObjectAggregator to limit the responses size I can
ingest without fearing to explode memory. Our platform crawls around 20
million of URLs daily, so we can't take any risk with potential malicious
URLs.

From my limited understanding of aleph, aleph.http.client/client-handler
function
(https://github.com/ztellman/aleph/blob/fa283e42d5b77963289c67396f8eeae200407415/src/aleph/http/client.clj#L137)
has no handling of FullHttpResponse object, which explains why I receive
this timeout.

Is there any way I could either integrate this HttpObjectAggregator handler
with aleph or limit the response size by any other mean?

Thanks in advance for your enlightenments,
Reynald
--
You received this message because you are subscribed to the Google Groups "Aleph" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aleph-lib+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reynald Borer
2018-06-21 19:50:31 UTC
Permalink
Hi,

I'm having a try at modifying aleph to support this channel handler.
Initial code change which seems to work when doing manual tests is
available under https://github.com/ztellman/aleph/pull/393 . Feedback is
welcome since I don't know aleph internals much :-)

I'll complete this PR at least with unit tests.

Cheers,
Reynald
Post by Reynald Borer
Hello friends,
I'l trying to use HttpObjectAggregator channel handler with aleph and I
can't get it to work. As soon as I enable it (through the
TimeoutException timed out after 30000 milliseconds
manifold.deferred/timeout!/fn--1815 (deferred.clj:1160)
(defn- configure-pipeline-fn
[max-content-length decompression?]
(fn [^ChannelPipeline pipeline]
; aggregate content and limit to max-content-length bytes
(.addAfter pipeline "http-client" "aggregator" (HttpObjectAggregator. max-content-length))
(if decompression?
(.addAfter pipeline "http-client" "deflater" (HttpContentDecompressor.)))))
I'm currently migrating https://paper.li/ crawler built on top of Netty
3.10 to aleph. Netty 3.10 integration was built by my predecessors and I
don't have enough Netty knowledge to migrate it to version 4.1. Since our
platform is already tightly coupled with manifold (great library btw), I
feel it make sense to migrate to aleph too :-)
I would like to use HttpObjectAggregator to limit the responses size I can
ingest without fearing to explode memory. Our platform crawls around 20
million of URLs daily, so we can't take any risk with potential malicious
URLs.
From my limited understanding of aleph, aleph.http.client/client-handler
function (
https://github.com/ztellman/aleph/blob/fa283e42d5b77963289c67396f8eeae200407415/src/aleph/http/client.clj#L137)
has no handling of FullHttpResponse object, which explains why I receive
this timeout.
Is there any way I could either integrate this HttpObjectAggregator
handler with aleph or limit the response size by any other mean?
Thanks in advance for your enlightenments,
Reynald
--
You received this message because you are subscribed to the Google Groups "Aleph" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Aleph" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aleph-lib+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...