Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Introducing Adept Experiments – use AI workflows to delegate repetitive tasks (adept.ai)
77 points by amks on Nov 9, 2023 | hide | past | favorite | 13 comments


For anyone looking to try this in an E2E testing context, we just released a library for Playwright called ZeroStep (https://zerostep.com/) that lets you script AI based actions, assertions, and extractions.

This is a working example that tests the core "book a meeting" workflow in Calendly:

    import { test, expect } from '@playwright/test'
    import { ai } from '@zerostep/playwright'

    test.describe('Calendly', () => {
      test('book the next available timeslot', async ({ page }) => {
        await page.goto('https://calendly.com/zerostep-test/test-calendly')

        await ai('Verify that a calendar is displayed', { page, test })
        await ai('Dismiss the privacy modal', { page, test })
        await ai('Click on the first available day of the month', { page, test })
        await ai('Click on the first available time in the sidebar', { page, test })
        await ai('Click the Next button', { page, test })
        await ai('Fill out the form with realistic values', { page, test })
        await ai('Submit the form', { page, test })

        const element = await page.getByText('You are scheduled')
        expect(element).toBeDefined()
      })
    })


It would be much easier to consider this as solution if it would _output_ the generated test steps, and/or cache them and only modify them if needed.

Your example above - 7 function calls in one test. let's say usually closer to 5, we have hundreds of tests. Every single PR runs E2E tests. We open a handful of PRs a day. Let's call it 5. We're already looking at thousands of invocations a day. Based on your pricing, that would be incredibly expensive.

This is with 3 eng.


What's the reliability and cost on something like this? I would need to see high-90s at <$0.10 before wanting to put it into a CI loop.


Pricing is listed on https://zerostep.com - you get 1,000 ai() calls per month for free, and then the cheapest paid plan is 2,000 ai() calls per month for $20, 4,000 for $40, etc. So basically you pay a penny per ai() call.

In terms of reliability - we have a hard dependency on the OpenAI API, so that's what will affect reliability the most. We're using GPT-3.5 and GPT-4 models, which have been fairly reliable, but we'll bump to GPT-4-Turbo eventually. Right now GPT-4-Turbo is listed as "not suited for production use" in OpenAI's docs: https://platform.openai.com/docs/models


That's one aspect of reliability, but the one I was more curious about was determinism. If I repeatedly run the same test suite on the same code base and the same data and configuration, am I guaranteed to get the same test results every time, or is it possible for ai() to change its mind about what actions to take?


Ah got it. So GPT is non-deterministic, but we somewhat handle that by having a caching layer in our AI. Basically if you make an ai() call, and we see that the page state is identical to a previous invocation of that exact AI prompt, then we will not consult the AI and install return you the cached result. We did this mainly to reduce costs and speed up execution of the 2nd-to-nth run of the same test, but it does make the AI a bit more deterministic.

There are some new features in GPT-4-Turbo that will let us handle determinism better, and we will be exploring that once GPT-4-Turbo is stable.


That makes a lot of sense, thank you for the explanation, I will have to explore this the next time I am building page tests. Have considered doing it myself but much happier using a relatively inexpensive product than maintaining the creaky homebuild version.


Thank you for the clarifying comment, this was really the thing I was meaning when I imprecisely said "reliability".


Nice! I'm going to try this out! Nit: For me, it would be nicer if `ai` was a fixture itself.

      test.describe('Calendly', ({ ai }) => {


Done! We added the ability to use it as a fixture. Documented here: https://github.com/zerostep-ai/zerostep#playwright-fixture


Does it send the webpage contents to ZeroStep?

Cool demo btw.


this seems useful based on the fact that software pieces do not work with each other. so the human has to manually move data from one to the other. in most of the cases if users always have to do A->B, does it make more sense to build automation in code instead of using ai? the automation can be built by engineers who are also assisted by ai.


Been excited for this type of neural network since they announced it months and months ago. Imagine this type of agent in conjunction with a framework like autogen




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: