Web Extraction Reference

Use useWebExtract to extract clean markdown text from public web pages for display, analysis, summarization, and other AI processing workflows.

useWebExtract

Import the hook from the web extraction module:

import { useWebExtract } from "@/hooks/use-web-extract";

export default function App() {
  const { extract, extractedText, isLoading, error } = useWebExtract();
  const [url, setUrl] = React.useState("");

  const handleExtract = async () => {
    if (!url.trim()) return;

    const text = await extract({ url: url.trim() });
    if (text) {
      console.log("Extracted content:", text);
    }
  };

  return (
    <div>
      <div className="flex gap-2">
        <input
          value={url}
          onChange={(event) => setUrl(event.target.value)}
          placeholder="Enter URL to extract..."
          className="flex-1"
        />
        <button onClick={handleExtract} disabled={isLoading || !url.trim()}>
          {isLoading ? <span className="inline-block mr-2 animate-spin rounded-full border-2 border-current border-t-transparent" /> : null}
          Extract
        </button>
      </div>
      {error && <p className="text-destructive">{error.message}</p>}
      {extractedText && (
        <div className="rounded-lg border p-4">
          <div>{extractedText}</div>
        </div>
      )}
    </div>
  );
}

AI Processing

Combine extraction with AI text hooks when you want to summarize or analyze a page.

import * as React from "react";
import { useAIText } from "@/hooks/use-ai";
import { useWebExtract } from "@/hooks/use-web-extract";

export default function App() {
  const { extract, isLoading: isExtracting } = useWebExtract();
  const { generateText, textStream, isLoading: isSummarizing } = useAIText({
    systemPrompt: "You are a helpful assistant that summarizes articles.",
  });
  const [url, setUrl] = React.useState("");

  const handleSummarize = async () => {
    if (!url.trim()) return;

    const content = await extract({ url: url.trim() });
    if (content) {
      await generateText({
        prompt: `Please summarize this article:\n\n${content}`,
      });
    }
  };

  const isLoading = isExtracting || isSummarizing;

  return (/* render URL input, loading state, and textStream */);
}

Best Practices

  • Pass fully qualified URLs and let the hook validate URL format before requesting extraction.
  • Extract one page at a time per hook instance. Use multiple instances for concurrent extractions.
  • Render or process extractedText as markdown.
  • Use setExtractedText(null) when users clear the URL or start a new unrelated workflow.
  • Combine with useAIText or useAIObject only after extraction succeeds.
  • Keep prompts concise when sending extracted content to AI so the task stays clear.
  • Surface error.message near the URL input so users can correct invalid or inaccessible URLs.