Since the stone-cold debut of ChatGPT and Stable Diffusion, the pace of generative AI hasn’t stopped for a moment. This pace of development is both marvelous and inevitably causes some anxiety and confusion: how far has AI come today? Am I missing out on a lot? Where should I start now?

To this end, we had a roundtable discussion. Some of them are authors of ChatGPT in-depth tutorials, some are teachers who are closely related to AI in the teaching field, some are tech practitioners who use AI a lot to assist their practical work, and some are AI enthusiasts with great hands-on skills and sharing spirit.

In the Q&A, we have included practical questions such as tool recommendations and selection tips, as well as widely discussed methodological issues such as responsible use and the open vs. closed debate, and we have also invited you to talk about your own changes in your understanding of AI and your expectations and outlooks. We hope that these Q&As will not only help you understand the current development of the AI industry, choose the right AI tools, and avoid some pitfalls and misunderstandings, but also stimulate your thinking and attract your attention; we welcome you to actively express your views in the comments.

AI Tools

At the beginning of last year, when all kinds of artificial intelligence products blossomed, basically all the products I was mixing and matching. However, now with the big model landing application is more and more mature, I am using only the following a few, basically we have not used also heard of:

Copilot on VSCode: an AI programming assistant that can help you generate code, interpret code, write comments, fix bugs, and so on. Copilot: an AI programming assistant that can help you generate code, explain code, write comments, fix bugs, and so on.

GPT-4: It is still the best product in the field of general-purpose macromodeling.

Stable Diffusion: an option for AI painting, used by both WebUI and ComfyUI. In the future I will also try Sora and the new SD model.

You only pay $20 per month for GPT-4. Copilot is free, and Stable Diffusion and ComfyUI are both open source and naturally free.

As for whether it’s worth it, at this price, if you’ve used any of the products and services above, you should find it a good value. Whether it’s Copilot’s code assistance features, GPT-4’s model store for GPTs, or SD’s direct image generation capabilities, these tools not only help me do my existing work faster and better, but they also inspire new ideas in an extremely cost-effective and rapid manner. They allow me to explore new areas that I have not previously touched. It is a rare opportunity that most of these once precious resources are now freely available to everyone.

Treat AI like a tool, and use your phone as responsibly as you use AI. in particular, use non-local AI services, think of using a ‘cloud phone’. Think about what you wouldn’t use your phone for, so don’t use AI for.


ChatGPT: OpenAI’s most widely known AI tool, I use more mobile apps, and subscribed to Plus for online search and custom GPT. everyday I use it to replace some traditional online search, handle difficult text, cross-domain learning, or problem solving, for example, I used AI to help me with my medical checkups before I added items. Because basic AI knowledge is quite general, it’s hard to solve problems in specific specialized fields, I have also built some GPTs for my own use in my work and life, such as a gout dietary health assistant based on USDA food data, and a policy research assistant based on the old and new regulations of the Department of Copyright.

Gemini (formerly Bard): Used as an alternative to ChatGPT Internet search, Gemini is better at Internet searching and linking to Google’s own ecosystem, such as finding web pages, YouTube videos, or your own Gmail emails. Although there are a lot of reviews on Google Gemini, free web search is still a good idea, and the quality of ChatGPT web search can be easily dragged down by Bing search, which makes Gemini based on Google’s search a good idea. According to my previous test, the search scope of Gemini should be wider than GPT-4. If you are looking for some YouTube videos, you will also prioritize Gemini, after all, it can directly link to YouTube search. Sometimes you need to find emails in Gmail, you can ask Gemini to help you, which is also hard to be replaced by ChatGPT.

GPT-3.5 API: The model called through the OpenAI API, I mainly used in a variety of customized automation, such as content platform opinion monitoring, Notion page automatically add icons, publicly posted dynamic automatic classification.

Notion AI: The AI embedded in the note-taking tool Notion is categorized into 3 usage scenarios: page, database, and global Q&A. On the notes page, I usually use it to generate summary, extract to-do, optimize text readability, and convert text to table; in the database, it is used to generate page summary, extract main points of the text, and translate in multiple languages; and the global Q&A is mainly used to find notes – if I can’t remember where a note is stored, or can’t search for a keyword, I will ask the AI to help me find it by using a description that can still be recalled. The global quiz is mainly used to find notes – if you can’t remember where a note is stored or can’t search for a keyword, you can use a description that you can still remember to let the AI help you find it. I synchronized all my notes to Notion centrally, which means I can use Notion AI to quickly find my notes in Notion, the Read It Later app, and the podcast app.

Snipd: A podcasting app with AI features, great for ‘dry’ podcasts. You can view the current English subtitle position in real time, tap to jump to the time of the specified text, take podcast notes and sync them to Notion. The AI function is mainly reflected in the podcast splitting chapters and auxiliary notes, when you hear a resonating or inspiring segment, tap the button to take notes, it will automatically recognize the starting and ending ranges, and generate the title and summary of notes according to the segment, and you can manually adjust the recognition range to not fit the expected range.

That’s about $58-68 per month on AI tools, including:

ChatGPT: $20 for a Plus subscription.

GPT-3.5 API: $30-$40 per usage, which is the most expensive, but also really effective. The highest value for me is for opinion monitoring, which covers almost 100% of newly published content on a given platform, is more flexible than traditional keyword matching, doesn’t require constant updating of keyword libraries for topics of concern, and recognizes even a mix of multi-languages as well as uncommon expressions. This cost is far less than the labor cost of adding a new person, not to mention that finding someone to tag thousands of pieces of data every day would be too much torture.

Notion AI: Annual subscription averages $8 per month. I’m a heavy Notion user, and it’s a lot easier to find notes, and I can also use it to improve my editing efficiency.
Overall, it’s a good value, especially the last two.

Be proactive: If you’re creating content for distribution that uses AI, you need to clearly label it as such and where it’s being used.

Don’t push AI output you can’t stand to your audience: Whether it’s text, images, or video, if you don’t want to consume the AI-generated content yourself, don’t throw it at your audience.
Don’t support AI users that kill quality creation: Don’t use or consume the services of AI tools or platforms that infringe on the rights of quality creators, or creators who try to replace human creation with low-quality AI content, and support those who are able to defend their rights.

Make sure it’s a real person in the “driver’s seat” and not an AI: When using AI to generate your own content, make sure that the final output is human-accepted and not used directly by the public.


I only use ChatGPT as a stand-alone AI tool, and for needs that ChatGPT doesn’t directly address, I use it to write tools rather than turning to other AI tools. I also don’t need a particularly complex AI tool, given that my needs are primarily text processing.

ChatGPT Plus is $20/month and worth it. Used to GPT 4, can’t stand GPT 3.5, especially when dealing with slightly more complex requirements, the performance difference between GPT-3.5 and GPT-4 becomes very noticeable.

ChatGPT: No introduction needed. Mainly used to experience various experimental new features of OpenAI and the current world’s leading GPT-4 model.

Ollama: Easy to play with local big models. Used to experience and test various new models and local development of AI applications.

Raycast Ollama: Raycast plugin can be used as an Ollama front-end to talk to models, and can also directly read files, text, images, etc. as input with the help of the system interface provided by Raycast, which is a replacement (or even stronger) of Raycast AI. Used for all kinds of fingertip AI requests, such as explaining, translating, checking, rewriting, etc. after selecting the text, eliminating the need for frequent copy and paste.

Poe: Large model square. Used to experience models that are difficult to deploy locally.

I previously subscribed to ChatGPT Plus for $20 per month, but have now canceled it in favor of the OpenAI API, which is pay-as-you-go. GPT-4, as the most advanced model available, is well worth trying, and I’d probably never want to use GPT-3.5 again once I’ve experienced it; however, the network and request-frequency limitations greatly affect the experience, and the ecosystem of GPTs is far from perfect. From a general user’s perspective, I think it’s a good value, but as a developer, paying for the API is probably a better option.

Stop the misuse of AI to generate disinformation. Almost all text platforms integrate AI tools, making it easy for the public to access and use AI, but also facilitating the batch generation of inflammatory and sensationalized false news. While this type of disinformation is easy to recognize, in the current era of information overload, the truth is often drowned out by disinformation, creating even more harm than Deepfake.


I pay more attention to the privacy and security of personal data, no matter work or life, except for some data that can not be avoided, generally will be more cautious to use those online services, AI tools services are more so, so I am currently using more are some of the open-source projects that can run locally.

Stable Diffusion: through the text to automatically generate images that match the description, with some extensions plug-ins can accomplish almost all image-related tasks.

Whisper: voice transcription tool , can be very efficient in the local audio transcription of most common languages into text .

sovits: tone conversion model, through about 2 hours of high-quality dry sound material to clone a person’s tone, with Microsoft TTS services can also achieve the effect of text-to-speech. At present, there are many similar projects, only with less material can achieve similar results, I have used it before because it has trained a lot of models, I do not bother to use another.

NVIDIA Broadcast: A local microphone and camera optimization software available for NVIDIA 2060+ graphics cards. I mainly use it for microphone noise reduction, can easily remove the sound of the keyboard and mouse during a call, and can also eliminate the echo if there is one in the room.

It doesn’t cost a dime. It’s not that I don’t want to pay the subscription fee, it’s just that I don’t have a strong demand for it. Just like playing games, if you want to enjoy playing foreign games, then it is inevitable to open a gas pedal, even if the gas pedal can also reduce latency to a certain extent, I believe that not many people will pay for it.

Personally, I don’t know many commercial AI tools, and the ones I usually come into contact with are open source AI projects. Although those projects are free, but most of them have a high threshold for use, and if you count the cost of time, it’s not much cheaper, after all, in order to realize a function, you may need to experience a lot of projects in order to determine the final program, and after that, you need to change some of the code according to your own needs or write new scripts.

Because in addition to a little bit of electricity and graphics card wear and tear in addition did not spend much money, I can only talk about those time spent on me worth it. In terms of results, it wasn’t really worth it, as most of the attempts were poorly executed, working, but always missing the point. In terms of the process, it was definitely worth it, and the process of training or fine-tuning a model on your own is quite fun, not to mention the fact that the model solves some real-world problems.

Also pay attention to the AI tool’s user agreement and the content platform’s user agreement and act within their licenses.

First of all, it must not violate the law.
Nowadays, AI is already very powerful, and the combination of some deep synthesis services can almost reach the level of fake to real. Just ask you real-time face change plus real-time voice change this set of combinations down, are you afraid? Relevant reports have actually been there for a long time.

Of course, this kind of fraudulent use of the scene is certainly not the majority of users will use. But even if it’s a spoof and a joke or something like deep synthesis, as long as you post it on the Internet, even if you give the relevant instructions, there is the possibility of causing trouble for yourself or others. Therefore, my advice for this kind of AI is to amuse yourself privately, and never post the synthesized content publicly.

For other kinds of AI tools, I don’t think you need to worry too much about using them normally, as long as they are in line with the mainstream morality of the society. In the end, AI is still a tool, so there’s no need to demonize it too much.


The AI tools I use most often are ChatGPT Plus, mainly for searching for information, editing and touching up manuscripts and programming. I also use ChatGPT’s built-in DALL-E 3 model to create blog and video covers; and Midjourney, which is more tailored to my needs in terms of drawing details and provides more flexibility for my creativity.

Perplexity is used to gather, summarize, and synthesize information, and is one of the indispensable tools in my daily work. It complements ChatGPT’s long text processing and code interpretation.

When it comes to privacy and sensitive information, I prefer to use TypingMind from Setapp, which is based on GPT-4, supports image recognition, and makes secure API calls, avoiding the risk of data being taken by OpenAI for training.

In addition, MurmurType in Setapp provides me with a convenient voice transcription service, which greatly improves my daily work efficiency.

My overall monthly expense for my current AI app subscription is about a couple of dollars:
ChatGPT Plus subscription is $20 per month;
Midjourney’s lowest subscription is about $10;

Through coupon codes and referrals, I’ve taken advantage of Perplexity’s discounted rates, which have recently been around $10 per month.

Setapp originally offered a very cost-effective service; for $10 per month, you could have $10 in AI calls, and the rest of the extensive software subscriptions (e.g., Ulysses, Craft, etc.) came for free. But with the introduction of the AI Expert subscription at the end of February, the cost of using Setapp went up, and the AI Expert subscription I chose cost me an extra $10 per month, which doubled the cost.

As a working class person, I’ve been weighing the necessity of each expense. Even though it’s expensive, I still think it’s worth it when I consider the tremendous value these tools bring to article and video production, proofreading, and organizing information. For example, in terms of searching for information, I’ve shifted from relying primarily on Google searches to the more efficient Perplexity autosummaries, which have dramatically increased my productivity.

I’ve always told my students, viewers, and readers, don’t use AI as your ‘gunslinger’. But, as we all know, it’s true that many people are now using AI to write for them.

Not to mention personal summaries and year-end reports, some authors of papers submitted to journals even forget to remove the sentence “As a large language model, I can’t answer this question” from the answers they write for large language models, which is embarrassing. In schools, there are also students who rely excessively on AI to complete their assignments. Although this can bring short-term convenience, in the long run, it will jeopardize the learning process and personal integrity, which is tantamount to buying a pearl in a box.

AI should be a means of augmenting, not replacing, learning and work. For example, in a writing task, AI can be used as an illuminating tool to help you generate initial ideas and frameworks. Users can use the draft generated by AI as a starting point according to their own understanding and research, and then analyze it in depth, expand the viewpoints, and adjust the structure.
The specific operation steps I recommend include: first, determine the writing topic and outline, and then use AI to propose possible arguments and structures; next, through your own research and thinking, use AI to investigate and summarize data materials, and then gradually expand and deepen these preliminary ideas after validation; lastly, rigorously edit and embellish the AI-generated content to ensure that the article meets academic standards. This process not only helps to improve writing skills, but also develops the author’s ability to deeply understand knowledge and think critically.

Throughout the process, the human user must control the direction of the writing and express his or her personal insights in order to be considered a ‘responsible’ use of AI.

Evaluate and Choose AI Tools


At present, there are really many AI products, but although there are many choices, there are not many that can fulfill the common requirements of most of us without having to be tossed and tuned. So when I’m picking a product, I’m more interested in the flexibility of the product. By flexibility, I mean how much control I can have over the generation of the product. In this regard, open source products have a big advantage. If the product is not open source, then it depends on whether there are enough control options that can allow me to control the generation effect according to my needs or plug into my existing workflow.

The capability of the product can be referred to based on the results of test sets, such as GLUE and SQuAD for testing natural language processing, WebQA and CMRC dedicated to Chinese testing. (I wrote more about them in the paid column.)

Also consider the difficulty of use and price. Unless quality is of particular concern, easy to use, cheap or even free is still the most attractive factor for most people.

By these criteria:
The best overall choice is ChatGPT Plus, which is not open source and costs $20. Although it is not open source, the OpenAI ecosystem is well-developed, with a high moat in terms of the number of apps in the GPTs store, rich APIs and good documentation, support for image document processing, and future video aggregation using multimodality, among other features.

Naturally, I chose GitHub Copilot because it’s free to use, and the integration with VS Code is high enough to make it work very smoothly, as if it were a native feature of the software. Even if the result is not good enough, I can accept it, anyway, I won’t use it to generate complete code, but it’s enough to assist programming.

The best flexibility is still Stable Diffusion: it is open source, and if you want to reduce the difficulty of using it, there are all kinds of one-click packaged WebUIs on the Internet, and all kinds of LoRA models are basically created for Stable Diffusion, which is a very rich ecology; there are also ComfyUI projects that can control the flow of the image generation process, as well as other projects that are based on Stable Diffusion itself. There are also ComfyUI, which can control the flow of image generation, and other projects built on the basis of Stable Diffusion itself. In short, whether it is for personal use or for commercial projects can be satisfied.

Thanks to understanding the principles from the beginning, my understanding of AI hasn’t changed much since the beginning.

Personally, I am in favor of AI-created content. From my own tests, as long as there are some restrictions and requirements on GPT, there is no way to distinguish between human-created content and AI-created content, whether it looks like it is created by most people or by using some so-called detection tools. Moreover, as more and more AI intervenes in the real world, there will be many scenarios where people don’t even realize that AI is being used: video encoding, traffic systems, power management, commodity production, and even animal husbandry, etc. AI has already brought us a better world, and various industries will not give up on the use of more advanced AI. Therefore, we can’t draw a sweeping conclusion of “threat” or “restriction”, but rather, we have to slowly find problems and repair and improve them in the course of continuous development.


Rarely will I specifically look for a bunch of side-by-side comparisons, usually I find someone has done a relatively bright case, or there is a hot spot for a certain AI tool, and I just happen to have a relevant problem to solve on hand, so I go to learn about it and try it out.

If I have to take the initiative to select and evaluate, I will pay attention to these:
Match of interest: whether it matches an important problem I want to solve or a direction I want to explore. For example, I didn’t have any interest in ChatGPT at the beginning, but tried Notion AI earlier because I have a lot of notes in Notion, and I was interested in how to explore the efficiency potential of Notion; but later when I needed to do public opinion monitoring, I found that although Notion AI could also categorize the content in the database, the accuracy rate was low and unstable, so I tried ChatGPT and used Notion AI. Then I used ChatGPT and GPT-3.5 API.

Test the effect: Try the AI tool in some specific scenarios to see if the effect meets or exceeds the expectation. If there is no AI tool that has been adopted in a certain usage scenario, as long as it meets the expectation, I will use it for a while; however, if there are already other AI tools in use, it is often necessary to exceed the expectation before switching to a new tool. For example, I compared Notion AI and GPT-3.5 in batch classification, and found that the latter is more accurate and faster, so I shifted the program to build around GPT-3.5.

Personal Development Difficulty: What is the expected investment if you develop a customized solution on your own instead of using an off-the-shelf tool. Although there are various AI tools on the market now, as an automation player, I already have experience using low-code tools Make and Pipedream, and the GPT API is also open. If I can quickly build my own solution and adjust it to my needs, then the attraction of off-the-shelf tools for me will be much weaker.

Pricing: Pricing is usually the last thing I think about. Sometimes when I come across a new AI tool, I will look for the price list in advance to understand the payment mode and price, but whether it is acceptable or not, I still need to understand the specific features and try it before I can judge. From last year’s experience, my acceptable threshold for AI tool pricing is still a bit higher than that of regular app subscriptions, but only if it hits my personal interest points and can produce results.

From questioning to understanding and using. Last year, when AI was hot, the company, not surprisingly, launched an internal training course on AI, requiring everyone to learn it. I evaluated the content of the training at the time and felt that it was not so relevant to my job, and there were a lot of work, so I refused the mandatory training because it conflicted with the values I had learned. Later on, when I encountered a problem that AI was good at handling, I quickly got used to it by self-learning. In the second half of the year, I built a public opinion monitoring system, and I was invited to be a judge in an internal AIGC competition for sharing my daily AI practice. However, when I looked at my colleagues who participated in the training at that time, apart from submitting their assignments, they did not make much noise in terms of subsequent AI implementation.

I still leave the creation to myself. In terms of graphic content creation, I still write by myself, and I only use AI to help me when I research and look for past notes. However, there is a headache in this regard, probably because I originally write content that emphasizes logic and likes to list information, and last year there were a few times when I was suspected that the text was generated by AI, which I think is very similar to the case of coughing during a certain period of time, when everyone around me would panic.

Brands that only use AI to reduce costs and increase efficiency do so at their own peril: AI’s ability to produce images quickly makes it ideal for use in the production of art content for product or marketing purposes, but I personally believe that if a brand or company only introduces AI to reduce costs and increase efficiency, and at the same time excludes aesthetically pleasing professional designers or neglects creative expression, it is likely to do so at its own peril. The number of unsightly AI ads on the market is becoming more and more prevalent, with Pixar-esque, bizarre portrait emojis and a lack of empathetic messaging everywhere you look.

As a consumer, I don’t care how brands cut costs and increase efficiency, I only care about how the quality of the product and service I end up with improves. But in the short term, some brands seem to prefer to immerse themselves in the AI frenzy, personally overdrawing the brand’s trust.


The basis for picking a tool is simple; the key is whether the expense will meet your needs. In other words, will the time and cost savings justify the expense.
But as I said before, I use GPT almost exclusively and when there’s a specific tooling need, I go straight to AI to develop my own custom tools. Why don’t you pick AI tools anymore? Because AI is evolving so fast these days, with tons of new tools and models appearing every month, that as an individual I don’t have the time to evaluate each one.

As a result, I’ve basically stopped looking for tools and am exploring more ways to use them. I especially enjoy checking out the community to see what new cue words and angles people are sharing. For some small features, I don’t need external tools, it’s easier to just develop them myself.

I think the current situation is a bit like the old note-taking tools, where everyone was constantly looking for new AI models and tools, and were shocked every time something new came out, as if they introduced some revolutionary feature and claimed to be ahead of the game in some way. But in reality, it doesn’t make much sense. At least for now, if you’re a developer or have special needs, you may still need to keep an eye on these tools; but for individual users, one shouldn’t waste time picking AI tools. If a particular AI tool is really good, it will naturally stand out.

After a year of development, various new AI tools and models have been introduced, and people’s expectations for AI development have been greatly raised. However, although AI technology has improved significantly in this year, it has not given a profoundly shocking qualitative leap like GPT-3.5.

At this point, it is too early to discuss the threat of AI, which mainly replaces repetitive tasks such as text organization, summarization, and the generation of AI images and videos. When it comes to true creation, AI is still far from mature; missing details can lead to logical confusion and inaccurate content, which is one of the reasons why current AI-generated fake news is so easy to recognize.

As a result, there is more reason to view AI as an aid than a threat than there was a year ago. It can significantly reduce work time and costs, and help people perform tasks more efficiently.


In most cases, I will pay more attention to the target demand and functional realization, the demand is the observation of the penetration of AI applications, and the realization is as a learning reference. Usually, I prefer open source tools, and ecology is very important for open source tools, so I will pay more attention to their compatibility and extensibility, and community activity, etc. In terms of pricing, there are two mainstream types, namely software buyout and service subscription.

In terms of pricing, there are currently two mainstream types, namely, software buyout and service subscription, the former, such as MindMac, if it hits the pain point needs, it is more worthwhile to pay; the latter, such as Raycast AI, can easily lead to subscription fragmentation, which needs to be considered comprehensively.

When I first came into contact with the big language model, I did not expect to be able to use it as a base to support multimodal capabilities such as vision in a short period of time, which brought a lot of new imagination to the text generation class of AI. At that time, it was also difficult to imagine the great controversy and serious challenges that this wave of AI research represented by big models would face in terms of security and ethics.


I don’t know much about paid commercial AI, so I can only talk about how I pick open source AI projects. There are two general scenarios.

The first is when I have a clear need. In this case, I list a few candidate projects based on the information I get from different platforms, and then check their project documents on GitHub to evaluate the effect and training difficulty.

I generally don’t consider projects that work well, but are too demanding on the graphics card and too difficult to train (e.g., the dataset takes a lot of time to preprocess). For projects that can be deployed on my computer, as long as the number of Stars on GitHub is higher than my mental expectation, then I usually give it a try. Of course, most of the time it’s the projects with the highest or second highest Star counts that are actually used.

It’s very easy to understand that, without considering the star count, the more people collect it, the more people use it, and the more people use it, the more people use it, the more people use it, the more people use it, the more people use it, the more people use it, the more people use it, the more people use it, the more people use it, the more people use it, the more people use it, the more people use it, the more people use it, and the more people use it.

In the other case, there is no clear need, so it’s much more random. In this case I have only one criterion: whether it is interesting or not.

I first came across AI tools purely for fun. As a person who loves to do things, I can’t help but try out new things. The first thing I came across was ChatGPT, an online chatting tool, and to be honest, the first time I used it, it really fulfilled most of what I had imagined about AI: an artifact that could have normal conversations with humans.

Later, after more open-source AI projects, the mystery was reduced; even more so after running many AI projects locally and studying the source code of some projects. To put it bluntly, it’s just a tool for a different way of realization.

Now the AI should actually be considered the beginning stage, part of the field is more powerful than human, but still can only be used as an auxiliary tool, from the real to be able to completely independent to complete the complex task is still a little distance. AI is actually expanding the boundaries of the individual’s ability. Take myself as an example, a person who has no artistic cells at all can easily accomplish some basic artistic creations with the assistance of AI tools.

Of course, the finished work can never be called perfect to meet the requirements, but only to say that it has reached the passing line. To be honest, AI does replace some of the bottom-level creators to a certain extent, but from another perspective, ordinary people can complete the basic creation with AI, so the original professionals can certainly complete more advanced works with AI, which can only be said to be disguised to raise the minimum standard generally recognized by society.


The first thing to look at is whether the features are powerful enough to meet my needs. Within budgetary constraints, I generally go for the most powerful one available. For example, I choose GPT-4 for text generation, Midjourney for mapping, and Perplexity for information retrieval, all of which are among the most powerful tools in their respective fields. I understand that new tools are popping up every day, and that they are going to be eye-catching with all sorts of features. But unless I’m impressed enough (e.g., with features like Gemini 1.5 Pro’s extra-long context window), I’m going to prefer to use applications that are already widely recognized.

Regarding pricing models, most AI tools will be subscription-based at this point, reflecting the high cost of computing power required to run them. For example, OpenAI’s ChatGPT Plus costs $20 per month, which is essentially the benchmark price for AI applications today. Pricing for most GPT-4-level large language modeling services also fluctuates in the upper and lower $10 range. This pricing strategy isn’t about squeezing profits out of users for no good reason; it’s about providers needing to generate enough revenue to cover their costs in the first place. After all, the math behind it, whether it’s bought (high-end graphics cards) or rented (e.g. AWS, Azure, etc.), costs money.

As for ecology, I particularly value the support and development community behind the tools. For example, Microsoft’s recent investment in Mistral will give users (including me) more confidence in the reliability and innovation potential of Mistral as a company.

Over the past year or so, my perception of AI has undergone several significant changes.

Initially, like most people, I was incredibly excited about the rise of AI technology, almost as if it would immediately render many traditional skills obsolete. However, over time, and especially after the ‘AI Cooling-Off’ period in the middle of last year, where I had in-depth offline conversations with many industry insiders, I began to gain a deeper understanding of AI’s real-world capabilities and the limitations imposed by the environment. I no longer think that many major events of industry disruption will happen frequently in the short term. But what hasn’t changed is that my confidence in AI trends remains very strong, and I’m determined to apply AI better.

Specifically, I’m thinking more proactively about how I can incorporate AI into my workflow in my studies, life, and work, rather than replacing myself. For example, when using AI to proofread text, I’ve found that its ability to quickly recognize and correct errors I hadn’t noticed has made a huge difference in the quality of my writing. For many lazy people, having an AI “proofreader” to help you proofread your text does more than just improve the final output; it gives you the power to really want to do the boring work of proofreading.

But at the same time, I have gradually realized that AI is best suited to performing specialized tasks with clear definitions and rules, and its current results are far from ideal when dealing with complex creative processes or deep logical reasoning. Even the current top models, such as GPT-4, are often “confused”. So I will give it the work that AI is best at doing, and leave the work that is suitable for human beings to do to myself, and gradually walk into the human-machine collaboration mode of “human-in-the-loop” (HITL).

Benefits of Using AI Tools


After getting used to AI, I will try to use AI in basically all scenarios, both at work and in life. Let’s talk about two major scenarios.

First of all, my job requires me to write all kinds of code. Recently, I’ve been working on a big automation project. Before Copilot, I was just knocking out code line by line in front of the computer, and the most help I got from the editor was just auto-completion of methods and functions. Now with Copilot, it’s great to have a # sign to go around the world, and if I want to realize some function, I can directly input the parameters accepted by the # function and the data I need to output, and then I can watch Copilot generate the whole function for me directly, without even thinking about the function name and variable name.

Before, I would look up a class or function on the internet if I forgot how to use it, but now I’m used to asking Copilot questions. Copilot directly rewrites the copied code to fit the project, and also generates comments and even fixes bugs directly based on the error messages.

In short, using Copilot has a smooth and comfortable sense that the workflow is not interrupted, and can actually reduce the number of typing and thinking with the brain, focusing on the entire project process, structure, as if from the manual era into the era of industrial automation, the liberation of productivity.

However, Copilot is best at assisted programming, and generating code is not its strongest suit, so I try to leave it to GPT-4. (There are already developers who have built a complete product using code generated by GPT-4, such as the new app by White Sketch Developers.) Of course, GPT-4 is not infallible. My approach is to preset a time for the AI to try the code according to its complexity, and if the GPT can’t generate code that satisfies me after that time, then I decide to write it by hand. This ensures that the AI can always bring positive benefits to itself.
Another scenario is article writing. In addition to regular generation, I’m currently trying to see if I can use AI to assist with larger, more complex creative projects.


Internet search. In the past, you had to Google for information, bouncing between different web pages, sifting through high-quality results, spending a lot of time reading and taking notes, and then repeating the process again if one round of searching didn’t solve the problem, often spending half a day on it. Now there is a networked search AI like GPT-4 (ChatGPT) and Gemini, you can directly let them help search, a few minutes to quickly understand the relevant information, there is no annoying ads pop-ups, you can ask more questions, omitting a lot of inefficient time in the search process.


I am a hobbyist developer driven by interest and do not have an in-depth knowledge of programming. When I wanted to develop a new tool with a new framework, I usually needed to spend a lot of time to fully understand the framework. With AI, the workflow is much simpler: I just refine my requirements and hand them over directly to the AI. In this way, my role has changed from that of a programming learner to that of a code reviewer, simply checking that the code generated by the AI is correct and modifying it if needed. All this requires is some basic knowledge of programming, which greatly lowers the development threshold.


It may be a bit of a niche – making replay videos, i.e. text adventure games with no options, where the main components are scenes, characters and lines. It used to be difficult for individuals to do this more seamlessly, with much of the footage being replaced with plausible alternatives found on the web, but now it’s easier to do it all with different AI tools.

Speech to text: After Whisper went open source, local transcription for individual users is a lot more convenient than the previous networking service, at least you don’t have to upload to the cloud one by one, which is slow and less private. With my own script, I can use the local Whisper model to batch transcribe previously cut audio files and automatically generate dialog scripts in serial number order.

Scenery and character standups: easily solved with Stable Diffusion, generate a few batches and you’ll always be able to pick something that works. As long as you are willing to spend time selecting models and adjusting parameters, you can basically draw any character. If you pursue perfection, you can even use your own POSE as a material to draw your stand-ups. The only pity is that most of the models will only paint young women, and men and old people need to find the right specialization models to get good results.

Music: Occasionally, I will use audiocraft to generate some clips for a change of taste, but I don’t use much, mainly because the finished product is not ready to release and profit, basically, I like what music with what music, I’m happy on it.

Voice: Because I generally act as a host, in addition to reading the introduction will also play some of the NPCs in the plot, so later if you happen to have the right sovits tone model on hand, you will also be able to change the voice of those NPCs that I play a tone, to add a little sense of immersion.

Code: I used renpy, a Python-based game engine, so the implementation of customization in it relies heavily on Python, which means that most of the code-assisted AI can be used to assist in scripting. Personally, I don’t really rely much on such aids.

In fact, the more popular AI tools are all favorable to personal creation, so that ordinary people who don’t know art, music, or voice-over have the opportunity to create their own satisfactory works at a lower learning cost. The recently released Sora text video model further amplifies the power of personal creation, and I believe that inserting homemade CGs into my replay will no longer be a difficult task in the near future.


There are many benefits, so here are two scenarios for drawing and programming.

On the drawing side, DALL-E 3 and Midjourney allow me to quickly generate cover art based on my specific needs, eliminating the pain of long searches through free galleries that often come up empty. When I need a cover for a specific theme, DALL-E 3 understands what I need and translates simple words into very detailed English prompts. If I’m not satisfied with the image generated by DALL-E 3, Midjourney can use the prompt to quickly provide me with multiple high-quality options to choose from. This ability to turn “what you think is what you draw” into reality is a great solution to the immediate needs of non-drawing authors like me.

In terms of programming, I usually toss efficiency tools, need to pass data between different applications, or programmatically re-call some fixed trivial workflow, so it is inevitable to write code. Currently, I am used to using and ChatGPT to build the initial framework, and then use GitHub Copilot to program conversationally, which has drastically improved my programming efficiency.

This past winter break, I wrote a set of multi-LLM hybrid invocation process, which can turn the usual work of touching up and translating into a one-click process. Whenever I encounter a problem in programming, I can solve it by interacting with the big language model, which is much more efficient than the traditional way of looking up solutions on Stackoverflow and other websites on my own. This collaborative approach not only saves me time, but also improves the enjoyment of programming.

Improvements of AI Tools


Currently, there are too few AI products that we can control ourselves. This is partly due to the opacity of the AI model itself, but also because most of the products focus too much on the display of the effect, but not seriously as a product for the general public to polish.

Specifically, the publicity and description of many products on the market are not enough for users to establish a clear understanding of their positioning and capabilities, and in the use of the experience to go to two extremes: either like most open-source products, “rough room”, the parameters can be controlled a lot, and the cloudy description of the document can be able to dissuade some of the professional users; or to make a cool UI, the only thing that can be used in practice is a Input box, generate what kind of effect all depends on luck and tuning tendency, the degree of openness is not enough, there is no way to form an ecosystem, as a result, can only meet some very general and basic needs.

The result is that it can only meet some very general and basic needs. At present, OpenAI has done a good job in balancing ease of use, effect and parameter openness, especially the GPTs function that allows ordinary users to create models in natural language for their own use or for sharing without the need for basic programming.


There is a lack of full process capability. There are a lot of voices on the Internet that emphasize how powerful AI is and how many jobs will be replaced. However, according to my own attempts in the past year, at least in text processing, AI is still more suitable to be the “co-driver” and deal with the inefficient parts of the original workflow, and it is still far from the “full-automatic driving”.

The publicity direction alienates users: there are AI products that do well in publicity, but it seems that there are not many of them. (I personally like the Arc Browser’s promotional video, the overall rhythm is well shaped, people can’t help but want to watch it a few more times.) What I see is more of index publicity and score brushing; some products seem to prefer to show good-looking local indexes and publicize that they have surpassed the world’s leading level in a certain direction, but what these indexes mean to the users and whether they are important or not are not mentioned, which is very much like publicity for the investors. The good thing is that a few well-done products can still be learned through word of mouth.

Lack of motivation for long-term use: AI has become powerful, but how many people need to do it, and how much better is it than the people who used to do it? It’s not a question of how close to or better than human experts AI is in metrics tests, but how many people are actually motivated to use it. I believe more people are still using it for short-term tastes, and long-term use of AI basically involves specialized content production, but we may have already overproduced.

Information sensitivity varies by industry and class. I don’t have much contact with samples, but last year I had a conversation with four people about AI, including a music teacher in the education industry, the owner of a company in the construction industry, the leader of a bank, and the owner of a restaurant. After the conversation, I found that the two people in the education and construction industries still know very little about the fast-developing AI, and their understanding of AI is still stuck in automation and photo-recognition of text; while the two people in the banking and catering industries are more informed, and they not only know that ChatGPT can be named, but also can list some of the scenarios for the use of AI. After these conversations, I have a speculation that those who are in fast-changing industries and near the Internet, and those who have a high level of social class or education are more likely to understand and come into contact with AI tools first.


The barrier to entry remains high. Due to the rapid development of AI technology, usage methods can easily become outdated, while new tools are constantly emerging. Ordinary users often do not have the time or interest to study the cue words and compare the advantages and disadvantages of different tools. As a result, many people prefer to directly use commercially available models provided by platforms, such as simply clicking a button to optimize text content, without wanting to delve into other options.

Lack of high-quality open-source models. The availability of open source AI models comparable to GPT-4 would greatly contribute to their popularity. Currently many commercial companies are using open source models on the one hand, while on the other hand giving users only their own commercially customized models for commercial purposes, a practice that undoubtedly limits the user experience and the effectiveness of the tool. When high-quality open source models become more widely available, we can expect greater popularization and enhancement of AI technology.


Model capability issues. Many tools are well thought out, but limited by the underlying model capability, they may not be able to achieve the desired results; or one day OpenAI launches an update, the model evolves, and the original problem the tool was meant to solve no longer exists.

Engineering landing problems. In real AI development, it may not be difficult to go from idea to demo, but the challenge from demo to application is not small.

Industry penetration problem. Only teamwork that understands both the industry and AI can make truly valuable industry applications.


Dataset source issues. It really isn’t a play on words to vigorously work miracles when the model itself hasn’t jumped up a notch. But the dataset itself is actually a gray area. Where does the massive amount of data come from? Will it work to use only licensed or uncopyrighted data? What if I follow the rules and others don’t? Even if there is money and willingness to buy copyrights, how much labor and time will it take for so many data sources? Looking at the OpenAI news actually gives you a lot of sense of this.

Personally, I would prefer to use open source AI tools that can be run locally because of privacy concerns, not to mention sensitive industries and sectors that need to consider data security.

However, there is a lot of locally run AI software out there, and in addition to open source projects, a lot of commercial software (such as some video editing software) also prioritizes the use of local arithmetic for AI reasoning in specific scenarios, but overall it’s still a minority. At present, almost all chip manufacturers have introduced AI into their own chips, but also launched a number of new concepts, such as NPU, AI PC, etc. If the back can really be optimized and promoted to achieve most of the AI application of the localization of the operation, I believe that the productivity tools will usher in the big reshuffle. Not long ago also saw Qualcomm launched AI Hub news, terminal side AI era may really not far away.


In my opinion, one of the main barriers limiting the popularity of AI tools is cost. As mentioned earlier, the high cost of computing resources has resulted in many AI apps charging exorbitant subscription fees, and many individual users have to be careful with their budgets.

Another issue is the lack of basic AI literacy among novice users. After paying for a subscription to an AI tool, many people don’t know how to use it to support their work. Often, they just take a quick look (e.g., “help me write a thesis”) according to their own overly optimistic imagination, and quickly go through the whole process of “from getting started to giving up”. In fact, whether it is the use of basic prompts or understanding the characteristics and usage scenarios of different AI tools, it is a very necessary basic skill for current users.

To solve the problem of lack of AI literacy, I think we need to enhance the promotion and popularization of AI-related knowledge and skills. The market is huge, and we’ve all seen it in the news not long ago from the revenue figures of a big name. But there are also a lot of people who are serious about content.

In Conclusion

More capabilities on process control. This trend is already happening, for example, Stable Diffusion 3 already supports specifying which part of an image generates what, or styling part of the content. After all, after all the fun and excitement, AI projects need to think more about how to put it into practice. Only by providing the ability to control the process, it is possible to better integrate into the existing workflow; if a small project can do this well, then even if the effect is a little worse than the strongest ones, there is still a way out.

Multimodal integration. This can be an integrated model that can take into account the generation of content in multiple formats, or a platform product that aggregates multiple models.

Localization and arithmetic requirements continue to decrease. Now the arithmetic requirements compared to the beginning of last year has fallen off a cliff, drawing from the previous high-end graphics card to the current low-end graphics card, high-end graphics card from the previous several seconds a map to the current dozens of maps a second; from the previous 1 billion parameter model of high-end PCs to run locally is still very difficult, to the current cell phones can claim that they can run 13 billion parameter large model. A variety of models tailored and quantized according to different shapes and computing power will blow up, giving us more choices and enabling more people to enjoy the benefits brought by large models.

Decentralization of content access – people can customize their own information flow based on AI for their physical and mental health, and long-term refinement direction, without being limited by click-through rate and engagement-first platform big data.

The official launch of Sora. Advances in video generation technology mean that anyone can use it for their own creative endeavors, such as converting personally told stories, written descriptions, or blog posts into videos. Such technological advances not only provide a broader platform for personal creativity, but could revolutionize the way content is created, shared and consumed.

The popularization of more cost-effective AI computing hardware and supporting software. At present, if you want to play AI projects locally, you basically need a Nvidia graphics card with high video memory, and there is really no room for choice. Before also think when the phone can really run a variety of big models in the local, the result is to see the news of Qualcomm AI Hub, maybe this year can really be in the phone local experience in the movie “Jarvis” like artificial intelligence assistant.

New models with 10M context processing power, such as the Gemini 1.5 Pro. This long context processing power will enable AI to understand and process far more than the current standard amount of text (the current mainstream context length of 200K or less), so as to achieve a qualitative leap in complex dialogues, research and analysis, and long-form document processing. Such technology enables AI to read and analyze multiple papers at once to write a review, or to extract character traits from a million-word novel for continuation and re-creation.

Published by Tony Shepherd & last updated on April 13, 2024 9:52 am

Comments are closed.