Reynald Borer
2018-06-14 12:50:50 UTC
Hello friends,
I'l trying to use HttpObjectAggregator channel handler with aleph and I
can't get it to work. As soon as I enable it (through the
pipeline-transform below) I get the following exception:
TimeoutException timed out after 30000 milliseconds
manifold.deferred/timeout!/fn--1815 (deferred.clj:1160)
Here is the pipeline-transform function that I use:
(defn- configure-pipeline-fn
[max-content-length decompression?]
(fn [^ChannelPipeline pipeline]
; aggregate content and limit to max-content-length bytes
(.addAfter pipeline "http-client" "aggregator" (HttpObjectAggregator. max-content-length))
(if decompression?
(.addAfter pipeline "http-client" "deflater" (HttpContentDecompressor.)))))
Context:
I'm currently migrating https://paper.li/ crawler built on top of Netty
3.10 to aleph. Netty 3.10 integration was built by my predecessors and I
don't have enough Netty knowledge to migrate it to version 4.1. Since our
platform is already tightly coupled with manifold (great library btw), I
feel it make sense to migrate to aleph too :-)
I would like to use HttpObjectAggregator to limit the responses size I can
ingest without fearing to explode memory. Our platform crawls around 20
million of URLs daily, so we can't take any risk with potential malicious
URLs.
From my limited understanding of aleph, aleph.http.client/client-handler
function
(https://github.com/ztellman/aleph/blob/fa283e42d5b77963289c67396f8eeae200407415/src/aleph/http/client.clj#L137)
has no handling of FullHttpResponse object, which explains why I receive
this timeout.
Is there any way I could either integrate this HttpObjectAggregator handler
with aleph or limit the response size by any other mean?
Thanks in advance for your enlightenments,
Reynald
I'l trying to use HttpObjectAggregator channel handler with aleph and I
can't get it to work. As soon as I enable it (through the
pipeline-transform below) I get the following exception:
TimeoutException timed out after 30000 milliseconds
manifold.deferred/timeout!/fn--1815 (deferred.clj:1160)
Here is the pipeline-transform function that I use:
(defn- configure-pipeline-fn
[max-content-length decompression?]
(fn [^ChannelPipeline pipeline]
; aggregate content and limit to max-content-length bytes
(.addAfter pipeline "http-client" "aggregator" (HttpObjectAggregator. max-content-length))
(if decompression?
(.addAfter pipeline "http-client" "deflater" (HttpContentDecompressor.)))))
Context:
I'm currently migrating https://paper.li/ crawler built on top of Netty
3.10 to aleph. Netty 3.10 integration was built by my predecessors and I
don't have enough Netty knowledge to migrate it to version 4.1. Since our
platform is already tightly coupled with manifold (great library btw), I
feel it make sense to migrate to aleph too :-)
I would like to use HttpObjectAggregator to limit the responses size I can
ingest without fearing to explode memory. Our platform crawls around 20
million of URLs daily, so we can't take any risk with potential malicious
URLs.
From my limited understanding of aleph, aleph.http.client/client-handler
function
(https://github.com/ztellman/aleph/blob/fa283e42d5b77963289c67396f8eeae200407415/src/aleph/http/client.clj#L137)
has no handling of FullHttpResponse object, which explains why I receive
this timeout.
Is there any way I could either integrate this HttpObjectAggregator handler
with aleph or limit the response size by any other mean?
Thanks in advance for your enlightenments,
Reynald
--
You received this message because you are subscribed to the Google Groups "Aleph" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aleph-lib+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "Aleph" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aleph-lib+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.